There are over 20 million records in a self-referential database table, where one record may point to another record as a descendant.
Because of a bug in application code, there was no limit on recursion. The max was supposed to be 4. A few outlier records have between 5 and 5000 descendants (there could be more. I manually found one chain of 5000.
I need to find all the chains of 5+ and mark them for update/deletion. While the database is about 10GB, the recursive search is maxing out on diskspace and causing a failure (there was over over 100GB of workspace free)
Is there any way to make a recursive query work, or will I have to use another means and just iterate over the entire dataset (either in postgres or an external service)
Thinking aloud - why doesn't just finding every record with 5 descendants not work? Any chain longer than 5 would have at least 5 items.
Even without recursion you could build out a five-way self-join and any records that make it that far are guilty. I suppose this assumes your setup is non-cyclic.
David J.