On Wed, Apr 8, 2009 at 9:42 AM, Marinos Yannikos <mjy@xxxxxxxxxxx> wrote: > It seems that ANALYZE does not really sample text column values as much as > it could. We have some very bad query plans resulting from this: > > ... > -> Bitmap Index Scan on m_pkey (cost=0.00..28.61 rows=102 width=0) > (actual time=171.824..171.824 rows=683923 loops=1) > Index Cond: ((e >= 'ean'::text) AND (e < 'eao'::text) > > This gets even worse for longer strings, where we know that many matching > rows exist: > > # explain analyze select substring(e,5) from m where id=257421 and e ~ > '^ean='; > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------- > Index Scan using m_pkey on m (cost=0.00..12.50 rows=1 width=60) (actual > time=1623.795..1703.958 rows=18 loops=1) > Index Cond: ((e >= 'ean='::text) AND (e < 'ean>'::text)) > Filter: ((e ~ '^ean='::text) AND (id = 257421)) > Total runtime: 1703.991 ms > (4 rows) > > Here it would be much better to use the existing index on "id" (btree) first > because the current index condition selects 683k rows whereas the result > contains 18 rows. Using the index on id would yield 97 rows to filter. > > Is it possible to work around this problem somehow, other than adding > partial indexes for the ~ / LIKE condition (when it's constant) or a > 2-dimensional index? > > (what exactly does ANALYZE look at for text columns? in our case, about 7% > of the rows match the index condition, so it seems that left-anchored > regexp/like matches are not evaluated using the gathered most-common-value > list at all) What are you using for default_statistics_target? You can see the gathered data in pg_statistic. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance