Search Postgresql Archives

Re: Postgres optimizer choosing wrong index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tom Lane wrote:
Jack Orenstein <jack.orenstein@xxxxxxx> writes:
- I created two schemas, NOVAC and VAC, each with a table T as described above.

- Before loading data, I ran VACUUM ANALYZE on VAC.T.

- I then started loading data. The workload is a mixture of INSERT, SELECT and UPDATE. For SELECT and UPDATE the WHERE clause always includes "dh = ? and fh = ?".

Basically your problem here is that vacuum records the size of the table
as zero (in pg_class.relpages/reltuples) and that causes the computed
costs of the two indexscans to be exactly the same, so it's a tossup
which one gets used.  (In recent versions I think the index with higher
OID would typically get chosen in a tie, but I forget if 7.4 worked that
way.)

8.0 and up are smart enough not to believe pg_class.relpages anymore
after you've loaded a lot of data, but 7.4 isn't.  In testing similar
cases here, I get reasonable cost estimates and a sane plan choice
from 7.4 so long as the stats are up to date.

Bottom line: you need to vacuum (or preferably analyze) *after*
initially populating a table, not before.

OK, I've added this behavior to my application. As the table is being loaded, I run VACUUM ANALYZE every 500 inserts, until we get to size 10,000. I know this is working because of application-level logging, and because I see relpages and reltuples go up.

EXPLAIN says that the correct index is being used -- it didn't used to. However, pg_stat* says otherwise. In my test, I have exactly one dh value. Running EXPLAIN with this value produces a plan using idx_dh (the correct index), but pg_stats says that idx_dn is being used (see psql session below).

This eventually works itself out. Eventually, the pg_stats for idx_dh start going up, showing that that index is eventually being used. But this discrepancy between EXPLAIN and actual query execution is making life very difficult.

Is the discrepancy between EXPLAIN and pg_stats due to some sort of caching per connection? E.g., a connection that uses one plan for a query is stuck with that plan for that query?

What would really be nice is a logging option that reported the execution plan actually used for a query.

Jack



ris=# \d t;
               Table "vac.t"
    Column    |          Type          | Modifiers
--------------+------------------------+-----------
 dh           | integer                | not null
 fh           | integer                | not null
 nm           | bigint                 |
...
Indexes:
    "idx_dn" btree (dh, nm)
    "idx_dh" btree (dh, fh)

ris=# select dh, count(*) from t group by dh;
 dh        | count
-----------+-------
 589849733 | 19890
(1 row)

ris=# explain select * from t where dh = 589849733 and fh = 0;
                                    QUERY PLAN
------------------------------------------------------------------
 Index Scan using idx_dh on t  (cost=0.00..5.26 rows=2 width=570)
   Index Cond: ((dh = 589849733) AND (fh = 0))
(2 rows)


ris=# select schemaname, indexrelname, idx_scan, idx_tup_read, idx_tup_fetch from pg_stat_user_indexes where schemaname = 'vac' and relname = 't' order by 1, 2;
 schemaname | indexrelname | idx_scan | idx_tup_read | idx_tup_fetch
------------+--------------+----------+--------------+---------------
 vac        | idx_dn       |    31315 |    122773990 |     122773990
 vac        | idx_dh       |        0 |            0 |             0



--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux