query plan using partial index expects a much larger number of rows than is possible

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,
I have a large table (about 13 million rows) full of customer order information. Since most of that information is for orders that have already been fulfilled, I have a partial index to help quickly zero in on rows that have not been fulfilled. This works well, but I noticed today when joining with another large table using its primary key that even though the planner was using my partial index, it decided to do a merge join to the second large table instead of the nested loop I would have expected.

Looking at it in more detail, I found that the planner is assuming that I'll get millions of rows back even when I do a simple query that does an index scan on my partial index:

=> \d orderitems_committed_unfulfilled 
Index "public.orderitems_committed_unfulfilled"
 Column |  Type  | Key? | Definition 
--------+--------+------+------------
 id     | bigint | yes  | id
btree, for table "public.orderitems", predicate (LEAST(committed, quantity) > fulfilled)

=> explain (analyze, buffers) select oi.id from orderitems oi where LEAST(oi.committed, oi.quantity) > oi.fulfilled;
                                                                            QUERY PLAN                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using orderitems_committed_unfulfilled on orderitems oi  (cost=0.41..31688.23 rows=2861527 width=8) (actual time=0.039..2.092 rows=2274 loops=1)
   Heap Fetches: 2493
   Buffers: shared hit=1883
 Planning Time: 0.110 ms
 Execution Time: 2.255 ms
(5 rows)

So nice and quick, but the planner thought it would get back 2861527 rows instead of the 2274 I actually have.  That explains why it thought it would make sense to do a merge join with my other large table instead of the nested loop over the 2k rows.  I would have expected the planner to know that there's no way it'll get back 2 million rows though, given that:

=> select relname, relpages, reltuples from pg_class where relname = 'orderitems_committed_unfulfilled';
             relname              | relpages | reltuples 
----------------------------------+----------+-----------
 orderitems_committed_unfulfilled |     3051 |      2112
(1 row)

It knows there's only 2k-ish of them from the index. The 2 mil number is the same as what the planner expects if I disable using indexes and it does a seq scan, so I'm assuming it's just the guess from the column statistics and the planner is not using the size of the partial index.

I'm running:
=> select version();
                                          version                                          
-------------------------------------------------------------------------------------------
 PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc, a 97d579287 p 0be2109a97, 64-bit
(1 row)

I'm wondering if the behavior that I'm seeing is expected in 12.4, and if so if it changes in a later version or if I should file an enhancement request? Or if it's not expected is there's something I'm doing wrong, or should file a bug?

Thanks for your time.

-- 
  Olivier Poquet
  opoquet@xxxxxxxxxxx





[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux