Hello Jeff,
Sadly, the query you suggested won't work because you are only returning the first row of the matching inner query rows.
Example:
SELECT
u.idx,
u.page_idxs
FROM
(
VALUES
(1, ARRAY[11, 21, 31]),
(2, ARRAY[12, 21, 32]),
(3, ARRAY[13, 23, 31])
)
AS u(idx, page_idxs)
WHERE
u.page_idxs && ARRAY[(
SELECT
p.idx
FROM
(
VALUES
(11, ARRAY[101, 201, 301]),
(21, ARRAY[102, 201, 302]),
(13, ARRAY[103, 203, 301])
)
AS p(idx, attribute_idxs)
WHERE
p.attribute_idxs && ARRAY[201]
FETCH FIRST 1 ROWS ONLY
)]
;
This query only returns one row while it should actually return two:
1 {11,21,31}
The INNER JOIN version of the query will return all matching rows but also include duplicates:
SELECT
u.idx,
u.page_idxs
FROM
(
VALUES
(1, ARRAY[11, 21, 31]),
(2, ARRAY[12, 21, 32]),
(3, ARRAY[13, 23, 31])
)
AS u(idx, page_idxs)
INNER JOIN
(
SELECT
p.idx
FROM
(
VALUES
(11, ARRAY[101, 201, 301]),
(21, ARRAY[102, 201, 302]),
(13, ARRAY[103, 203, 301])
)
AS p(idx, attribute_idxs)
WHERE
p.attribute_idxs && ARRAY[201]
)
AS p2
ON u.page_idxs && ARRAY[p2.idx]
;
Results:
1 {11,21,31}
1 {11,21,31}
2 {12,21,32}
As far as I know, the the IN + sub-_expression_ query can't work since the left side of the operation is an array of integers and the right side a set of rows with a single integer column.
The reason I'm using integer arrays is because it is the only way I have found in PostgreSQL to get fast inclusion / exclusion checks on large datasets (hundreds of millions of values).
Did I misunderstand your response?
Thank you for the ongoing help,
Mickael
On Mon, May 23, 2022 at 4:45 AM Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
On Fri, May 20, 2022 at 6:42 AM Mickael van der Beek <mickael.van.der.beek@xxxxxxxxx> wrote:Query:EXPLAIN (
ANALYZE,
VERBOSE,
COSTS,
BUFFERS,
TIMING
)
SELECT
fu.w2_page_idxs
FROM
fact_users
AS fu
WHERE
EXISTS (
SELECT
FROM
(
SELECT
ARRAY[idx] AS page_idx
FROM
fact_pages
WHERE
attribute_idxs && ARRAY[300000160]
FETCH FIRST 1 ROWS ONLY
)
AS fp
WHERE
fu.w2_page_idxs && fp.page_idx
)
;Without any surprises, the planner is using a sequential scan on the "fact_users" table which is very large instead of using the GIN index set on the "w2_page_idxs" column.For me, using the subquery in and _expression_, instead of the EXISTS, does get it to use the gin index. And I think it must give the same results.SELECT
fu.w2_page_idxs
FROM fact_users AS fu
WHERE
fu.w2_page_idxs && ARRAY[(select idx from fact_pages where attribute_idxs && ARRAY[3003] FETCH FIRST 1 ROWS ONLY)];But why are you using intarray? That is unnecessary here, and by creating ambiguity about the array operators it might be harmful.Cheers,Jeff