why choosing an hash index instead of the btree version even if the cost is lower?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,
I'm just having a doubt about the choice of the planner for a small
example table.
I've a table with a numeric column (integer), and I've created two
indexes on such column, one btree and one hash. The hash results much
larger as the btree, but what puzzles me is that executing an equality
simple query, the system chooses the hash index (that has a final cost
of 8984.08 while the btree index would have a final cost a little
lower (8901.94).

The only difference I can spot in the EXPLAIN plans is that the btree
index has an initial cost, but I don't think this is the reason, since
it should be the final cost what matters, right?

Now, even if the two costs are comparable, why does the optimizer
chooses to use the larger hash index? What am I missing here and where
do I have to dig?
Using EXPLAIN ANALYZE shows that the two indexes are very similar
timings, so while using the hash index is clearly not a wrong choice,
I'm wondering why preferring a bigger index.

Please note that the table has been manually ANALYZEd and is not clustered.


Thanks,
Luca


testdb=> select version();
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 14.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.2.1
20220127 (Red Hat 11.2.1-9), 64-bit

testdb=> select relname, pg_size_pretty( pg_relation_size( oid )) from
pg_class where relname = 'articoli';
relname  | pg_size_pretty
----------+----------------
articoli | 134 MB

testdb=> \d articoli
Table "public.articoli"
Column   |  Type   | Collation | Nullable |           Default
-----------+---------+-----------+----------+------------------------------
pk        | integer |           | not null | generated always as identity
codice    | text    |           | not null |
prezzo    | integer |           |          | 0
citta     | text    |           |          |
magazzino | integer |           |          | 1
visibile  | boolean |           |          | true




testdb=> create index articoli_prezzo_idx on articoli(prezzo);
testdb=> create index articoli_prezzo_hash_idx on articoli using hash (prezzo);
CREATE INDEX
testdb=> select relname, pg_size_pretty( pg_relation_size( oid )) from
pg_class where relname like 'articoli%idx%';
relname          | pg_size_pretty
--------------------------+----------------
articoli_prezzo_idx      | 15 MB
articoli_prezzo_hash_idx | 47 MB


testdb=> explain select * from articoli where prezzo = 77;
QUERY PLAN
------------------------------------------------------------------------------------------------
Index Scan using articoli_prezzo_hash_idx on articoli
(cost=0.00..8984.08 rows=5033 width=28)
Index Cond: (prezzo = 77)



testdb=> begin;
BEGIN
testdb=*> drop index articoli_prezzo_hash_idx;
DROP INDEX
testdb=*> explain select * from articoli where prezzo = 77;
QUERY PLAN
-------------------------------------------------------------------------------------------
Index Scan using articoli_prezzo_idx on articoli  (cost=0.42..8901.94
rows=5033 width=28)
Index Cond: (prezzo = 77)
(2 rows)

testdb=*> rollback;
ROLLBACK



If it does matter, these is an excerpt from the pg_stats:

testdb=> select tablename, attname, n_distinct, most_common_vals,
most_common_freqs from pg_stats where tablename = 'articoli' and
attname = 'prezzo';
tablename | attname | n_distinct | most_common_vals |
most_common_freqs
-----------+---------+------------+------------------+-----------------------------------
articoli  | prezzo  |        200 | {62,147,154}     |
{0.0062666666,0.0060333335,0.006}


And here the EXPLAIN ANALYZE outputs:

testdb=> explain analyze select * from articoli where prezzo = 77;
                                                                 QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using articoli_prezzo_hash_idx on articoli
(cost=0.00..8971.95 rows=5026 width=27) (actual time=0.013..5.821
rows=5200 loops=1)
   Index Cond: (prezzo = 77)
 Planning Time: 0.108 ms
 Execution Time: 6.037 ms
(4 rows)

testdb=> begin;
BEGIN
testdb=*> drop index articoli_prezzo_hash_idx;
DROP INDEX
testdb=*> explain analyze select * from articoli where prezzo = 77;
                                                               QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using articoli_prezzo_idx on articoli  (cost=0.42..8891.65
rows=5026 width=27) (actual time=0.034..6.561 rows=5200 loops=1)
   Index Cond: (prezzo = 77)
 Planning Time: 0.165 ms
 Execution Time: 6.843 ms
(4 rows)

testdb=*> rollback;
ROLLBACK





[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux