Re: Postgres optimizer choosing wrong index

Jack Orenstein <jack.orenstein@xxxxxxx> · Fri, 24 Oct 2008 14:21:50 -0400

Tom Lane wrote:
Jack Orenstein <jack.orenstein@xxxxxxx> writes:
I may have simplified too far. Our application runs a number of
different queries. All our WHERE clauses restrict dh and fh. For a
given pair of (dh, fh) values, the initial query should come up empty
and then insert this pair, and then there is further processing
(SELECT, UPDATE). Something is causing a huge number of index row
reads (according to pg_stat_user_indexes) but only in tables that have
been vacuumed.

Well, that's a bit more concrete but it's still difficult to tell where
the problem is.  Are you by any chance creating new tables and then
vacuuming them while they're still empty?  That would cause
pg_class.relpages to get set to zero, and 7.4.x is not bright enough to
change plans when you later fill the table (until you do another vacuum
or analyze).  However, I think 7.4 would always choose a seqscan on a
table it thinks is zero-size, so I'm not sure that that's what's going
on here.

I've collected much more detailed information on this problem.

(Recap: In 7.4, table T(int dh, int fh, int nm, ...) has indexes idx_df on (dh, 
fh), and idx_dn on (dh, nm). Queries of the form "... where dh = ? and fh = ?" 
appear to be using the (dh, nm) index leading to horrendous performance in cases 
where there are very few unique dh values, and T has been vacuumed.)

The optimizer is definitely picking the wrong index even when the query 
restricts dh to a value actually present in the table. (Yesterday, I was misled 
by running EXPLAIN with a value not in the table.) Tom's speculation above is 
mostly correct except that, as shown below, the optimizer is not choosing a seqscan.

EXPERIMENT:

- I created two schemas, NOVAC and VAC, each with a table T as described above.

- Before loading data, I ran VACUUM ANALYZE on VAC.T.

- I then started loading data. The workload is a mixture of INSERT,   SELECT and 
UPDATE. For SELECT and UPDATE the WHERE clause always includes "dh = ? and fh = ?".

PG_STAT RESULTS:

Stats collection was enabled. Here is sample output during the run, (yes, those 
are real numbers, I just happened to run the query when the 2nd row counters hit 
those nice round numbers).

    ris=# select schemaname, indexrelname, idx_scan, idx_tup_read, 
idx_tup_fetch from pg_stat_user_indexes where schemaname in ('novac', 'vac') and 
relname = 'external_file' order by 1, 2;
     schemaname | indexrelname | idx_scan | idx_tup_read | idx_tup_fetch
    ------------+--------------+----------+--------------+---------------
     novac      | idx_dn       |        0 |            0 |             0
     novac      | idx_dh       |    25000 |        20000 |         20000
     vac        | idx_dn       |    16093 |     25991728 |      25991728
     vac        | idx_dh       |        0 |            0 |             0

This shows the different index usage in the two schemas. The schema containing 
the vacuumed table used idx_dn exclusively; in the other schema, idx_dh was used 
exclusively.

EXPLAIN PLAN RESULTS:

This query shows the distribution of T.dh values in NOVAC:

    ris=# select dh, count(*) from novac.t group by dh;
     dh        | count
    -----------+-------
     280433162 |  5000
     601454890 |  5000

Using one of these dh values, the optimizer selects idx_df as I expected (the fh 
values are nearly unique; 0 is the median of the possible values):

    ris=# explain select * from novac.t where dh = 280433162 and fh = 0;
                                        QUERY PLAN 

    ------------------------------------------------------------------
     Index Scan using idx_df on t  (cost=0.00..4.83 rows=1 width=454)
       Index Cond: ((dh = 280433162) AND (fh = 0))
    (2 rows)

But in the VAC schema:

    ris=# select dir_hash, count(*) from vac.t group by dh;
     dir_hash  | count
    -----------+-------
     758082190 |  5000
     980351022 |  5000
    (2 rows)

    ris=# explain select * from vac.t where dh = 758082190 and fh = 0;
                                                QUERY PLAN 

    ------------------------------------------------------------------
     Index Scan using idx_dn on t  (cost=0.00..3.68 rows=1 width=454)
       Index Cond: (dh = 758082190)
       Filter: (fh = 0)
    (3 rows)

Here are page and tuple counts:

    ris=# select n.nspname, relname, relpages, reltuples from pg_class, 
pg_namespace n where relname = 't' and relnamespace = n.oid;
     nspname  | relname | relpages | reltuples
    ----------+---------+----------+-----------
     novac    | t       |       10 |      1000
     vac      | t       |        0 |         0

From Tom Lane's earlier email, the vacuum prior to the load explains the 0s in 
the second row.

pg_stats has no data for table T in either schema. In the case of the VAC 
schema, I'm guessing this is because the table was actually empty when analyzed.

CONCLUSION:

The optimizer is objectively making the wrong choice. The pg_stat results 
included above, (as well as the block-level counts), show that the idx_df index 
should have been used. While the dh part of the index is not highly selective, 
the fh part is. And with very high ratios of fh to dh values, (as in my 
experiment), it matters a lot. The transaction rates measured by the test 
program make this clear even without looking at pg_stat numbers.

Admittedly, the optimizer doesn't have much to go on. But I would think that 
with a WHERE clause of the form "dh = ? and fh = ?", choosing the (dh, fh) index 
over any other index would be a reasonable heuristic.

Jack

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general