Dimitrios Apostolou <jimis@xxxxxxx> writes: > I have a query that goes through *billions* of rows and for the columns > that have an infrequent "datatag" (HAVING count(test_datatag_n)<10) it > selects all the IDs of the entries (array_agg(run_n)). Here is the full > query: > INSERT INTO infrequent_datatags_in_this_chunk > SELECT datatag, datatags.datatag_n, array_agg(run_n) > FROM runs_raw > JOIN datatags USING(datatag_n) > WHERE workitem_n >= 295 > AND workitem_n < 714218 > AND datatag IS NOT NULL > GROUP BY datatags.datatag_n > HAVING count(datatag_n) < 10 > AND count(datatag_n) > 0 -- Not really needed because of the JOIN above > ; > The problem is that this is extremely slow (5 hours), most likely because > it creates tens of gigabytes of temporary files as I see in the logs. I > suspect that it is writing to disk the array_agg(run_n) of all entries and > not only those HAVING count(datatag_n)<10. Well, yes: the two aggregates (array_agg and count) are computed concurrently in a single Aggregate plan node scanning the output of the JOIN. There's no way to apply the HAVING filter until after the aggregation is finished. I think this approach is basically forced by the SQL standard's semantics for grouping/aggregation. > How do I tell postgres to stop aggregating when count>=10? The only way to do this would be to do two separate passes of aggregation in separate sub-queries. Perhaps like WITH rare AS ( SELECT datatag_n FROM runs_raw WHERE workitem_n >= 295 AND workitem_n < 714218 AND datatag IS NOT NULL GROUP BY datatag_n HAVING count(datatag_n) < 10 AND count(datatag_n) > 0 ) INSERT INTO infrequent_datatags_in_this_chunk SELECT datatag, datatags.datatag_n, array_agg(run_n) FROM runs_raw JOIN datatags USING(datatag_n) JOIN rare USING(datatag_n) GROUP BY datatags.datatag_n ; I can't tell from what you said which level the workitem_n and datatag conditions go at, so this is just a draft-quality query. But I think the structure is basically okay, given that you said datatag_n is unique in datatags (so there's no need to join it in the WITH step). regards, tom lane