Re: Array of integer indexed nested-loop semi join

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Jeff,

Sorry for the delay, here are the EXPLAIN ANALYSE results for one single row in the inner-query:

Nested Loop Semi Join  (cost=10000000993.81..10004731160.70 rows=536206 width=28) (actual time=93765.182..93765.183 rows=0 loops=1)
  Output: fu.w2_page_idxs
  Join Filter: (fu.w2_page_idxs && (ARRAY[fact_pages.idx]))
  Rows Removed by Join Filter: 53762825
  Buffers: shared hit=569194 read=2821768
  I/O Timings: read=56586.955
  ->  Seq Scan on public.fact_users fu  (cost=10000000000.00..10003925857.68 rows=53620568 width=28) (actual time=79.139..67423.779 rows=53762825 loops=1)
        Output: fu.w2_page_idxs
        Buffers: shared hit=567884 read=2821768
        I/O Timings: read=56586.955
  ->  Materialize  (cost=993.81..994.50 rows=1 width=32) (actual time=0.000..0.000 rows=1 loops=53762825)
        Output: (ARRAY[fact_pages.idx])
        Buffers: shared hit=148
        ->  Limit  (cost=993.81..994.48 rows=1 width=32) (actual time=26.382..26.383 rows=1 loops=1)
              Output: (ARRAY[fact_pages.idx])
              Buffers: shared hit=148
              ->  Bitmap Heap Scan on public.fact_pages  (cost=993.81..70645.00 rows=103556 width=32) (actual time=26.378..26.379 rows=1 loops=1)
                    Output: ARRAY[fact_pages.idx]
                    Recheck Cond: (fact_pages.attribute_idxs && '{300000160}'::integer[])
                    Heap Blocks: exact=1
                    Buffers: shared hit=148
                    ->  Bitmap Index Scan on fact_pages_attribute_idxs_int  (cost=0.00..967.92 rows=103556 width=0) (actual time=14.865..14.865 rows=101462 loops=1)
                          Index Cond: (fact_pages.attribute_idxs && '{300000160}'::integer[])
                          Buffers: shared hit=147
Query Identifier: 6779965332684941204
Planning:
  Buffers: shared hit=2
Planning Time: 0.162 ms
JIT:
  Functions: 10
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 1.507 ms, Inlining 9.797 ms, Optimization 54.902 ms, Emission 14.314 ms, Total 80.521 ms
Execution Time: 93766.772 ms

Query:

EXPLAIN (
  ANALYZE,
  VERBOSE,
  COSTS,
  BUFFERS,
  TIMING
)
SELECT
  fu.w2_page_idxs
FROM
  fact_users
    AS fu
WHERE
  EXISTS (
    SELECT
    FROM
      (
        SELECT
          ARRAY[idx] AS page_idx
        FROM
          fact_pages
        WHERE
          attribute_idxs && ARRAY[300000160]
        FETCH FIRST 1 ROWS ONLY
      )
        AS fp
    WHERE
      fu.w2_page_idxs && fp.page_idx
  )
;

Without any surprises, the planner is using a sequential scan on the "fact_users" table which is very large instead of using the GIN index set on the "w2_page_idxs" column.

Link to the query plan visualiser: https://explain.dalibo.com/plan/1vC

Thank you very much in advance,

Mickael

On Wed, Apr 27, 2022 at 4:54 PM Mickael van der Beek <mickael.van.der.beek@xxxxxxxxx> wrote:
Hello Jeff,

I have waited a few hours without the query ever finishing which is the reason I said "never finishes".
Especially because the INNER JOIN version finishes within a few minutes while being combinatorial and less efficient.
The query probably only does sequential scans.

You will find the query plan using EXPLAIN here:
- Visual query plan: https://explain.dalibo.com/plan#plan
- Raw query plan: https://explain.dalibo.com/plan#raw

Thanks for your help,

Mickael

On Wed, Apr 27, 2022 at 4:28 PM Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
On Wed, Apr 27, 2022 at 8:19 AM Mickael van der Beek <mickael.van.der.beek@xxxxxxxxx> wrote:

The last query does not finish after waiting for more than 15 minutes.
(The temporary view creation is very fast and required due to the same query in a CTE greatly reducing performance (by more than 5 min.) due to the optimisation barrier I'm guessing.)

How much over 15 minutes?  20 minutes doesn't seem that long to wait to get a likely definitive answer.  But at the least show us the EXPLAIN without ANALYZE of it, that should take no milliseconds.

And what does it mean for something to take 5 minutes longer than "never finishes"?

(Also, putting every or every other token on a separate line does not make it easier to read)

Cheer,

Jeff



--

Mickael van der Beek

Web developer & Security analyst



--

Mickael van der Beek

Web developer & Security analyst

mickael.van.der.beek@xxxxxxxxx


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux