On Mon, May 23, 2022 at 3:57 AM Mickael van der Beek <mickael.van.der.beek@xxxxxxxxx> wrote:
Hello Jeff,Sadly, the query you suggested won't work because you are only returning the first row of the matching inner query rows.
Sure, but the query I replaced did the same thing. (I thought that was what you wanted, but I guess that was just to get it to run fast enough to ever finish--in that case it is probably better to use EXPLAIN without the ANALYZE so that we can see the plan of the correct query). To get around that one-row limit you have to write it somewhat differently, getting rid of the ARRAY and adding an array_agg():
SELECT fu.*
FROM
fact_users AS fu
WHERE
fu.w2_page_idxs && (select array_agg(idx) from fact_pages where attribute_idxs && ARRAY[201]);
FROM
fact_users AS fu
WHERE
fu.w2_page_idxs && (select array_agg(idx) from fact_pages where attribute_idxs && ARRAY[201]);
This way of writing it is better, as it still works with the LIMIT 1 but also works without it. This still uses the indexes for me, at least when enable_seqscan is off.
The INNER JOIN version of the query will return all matching rows but also include duplicates:
You could just add a DISTINCT to get rid of the duplicates. Of course that will also take some time on a large returned data set, but probably less time than scanning a giant table. I think this is probably cleaner than the alternatives.
The reason I'm using integer arrays is because it is the only way I have found in PostgreSQL to get fast inclusion / exclusion checks on large datasets (hundreds of millions of values).Did I misunderstand your response?
I don't know if you misunderstood. I meant specifically the intarray extension. You can use integer arrays with built-in GIN indexes without help from the intarray extension. Maybe you know that already and are just saying that the extension is even faster than the built-in indexed operators are and you need that extra speed.
Cheers,
Jeff