Statistics use with functions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




I'm running a rather complex query and noticed a peculiarity in the usage of statistics that seriously affects the plan generated. I can extract the relevant bit:

modmine-r9=# select * from pg_stats where tablename = 'geneflankingregion' AND attname IN ('distance', 'direction');
 schemaname |     tablename      |  attname  | null_frac | avg_width | n_distinct |         most_common_vals         |               most_common_freqs                | histogram_bounds | correlation
------------+--------------------+-----------+-----------+-----------+------------+----------------------------------+------------------------------------------------+------------------+-------------
 public     | geneflankingregion | distance  |         0 |         6 |          5 | {5.0kb,0.5kb,1.0kb,2.0kb,10.0kb} | {0.201051,0.200798,0.200479,0.199088,0.198583} |                  |    0.197736
 public     | geneflankingregion | direction |         0 |        10 |          2 | {downstream,upstream}            | {0.500719,0.499281}                            |                  |    0.495437
(2 rows)

modmine-r9=# SELECT COUNT(*) FROM geneflankingregion;
 count
--------
 455020
(1 row)

modmine-r9=# explain analyse SELECT * FROM geneflankingregion WHERE distance = '10.0kb' AND direction = 'upstream';
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Seq Scan on geneflankingregion  (cost=0.00..15507.30 rows=45115 width=213) (actual time=0.053..181.764 rows=45502 loops=1)
   Filter: ((distance = '10.0kb'::text) AND (direction = 'upstream'::text))
 Total runtime: 227.245 ms
(3 rows)

modmine-r9=# explain analyse SELECT * FROM geneflankingregion WHERE LOWER(distance) = '10.0kb' AND LOWER(direction) = 'upstream';
                                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on geneflankingregion
   (cost=66.95..88.77 rows=11 width=213)
   (actual time=207.555..357.359 rows=45502 loops=1)
   Recheck Cond: ((lower(distance) = '10.0kb'::text) AND (lower(direction) = 'upstream'::text))
   ->  BitmapAnd
         (cost=66.95..66.95 rows=11 width=0)
         (actual time=205.978..205.978 rows=0 loops=1)
         ->  Bitmap Index Scan on geneflankingregion__distance_equals
               (cost=0.00..31.34 rows=2275 width=0)
               (actual time=79.380..79.380 rows=91004 loops=1)
               Index Cond: (lower(distance) = '10.0kb'::text)
         ->  Bitmap Index Scan on geneflankingregion__direction_equals
               (cost=0.00..35.35 rows=2275 width=0)
               (actual time=124.639..124.639 rows=227510 loops=1)
               Index Cond: (lower(direction) = 'upstream'::text)
 Total runtime: 401.740 ms
(8 rows)

When I wrap the fields in the constraints in a LOWER() function, the planner stops looking at the statistics and makes a wild guess, even though it is very obvious from just looking what the result should be. Embedded in a much larger query, the inaccuracy in the number of rows (11 instead of 45502) causes major planning problems. Also, why does the BitmapAnd say zero actual rows?

I understand this probably isn't Priority No. 1, and there are some interesting corner cases when n_distinct is higher than the histogram width, but would it be possible to fix this one up?

Matthew

--
I would like to think that in this day and age people would know better than
to open executables in an e-mail. I'd also like to be able to flap my arms
and fly to the moon.                                    -- Tim Mullen

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux