Hi, On 02/17/2017 11:19 PM, Hustler DBA wrote:
I am seeing this strange behavior, I don't know if this is by design by Postgres. I have an index on a column which is defined as "character varying(255)". When the value I am searching for is of a certain length, the optimizer uses the index but when the value is long, the optimizer doesn't use the index but does a seq scan on the table. Is this by design? How can I make the optimizer use the index no matter what the size/length of the value being searched for?
AFAIK there are no such checks, i.e. the optimizer does not consider the length of the value when deciding between scan types.
PostgreSQL version: 9.4
That's good to know, but we also need information about the table involved in your queries. I'd bet the table is tiny (it seems to be just 12 pages, so ~100kB), making the indexes rather useless.
my_db=# explain (analyze, buffers) select count(*) from tab where ID = '01625cfa-2bf8-45cf' ; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=8.29..8.30 rows=1 width=0) (actual time=0.048..0.048 rows=1 loops=1) Buffers: shared read=2 -> Index Only Scan using tab_idx1 on tab (cost=0.27..8.29 rows=1 width=0) (actual time=0.043..0.043 rows=0 loops=1) Index Cond: (ID = '01625cfa-2bf8-45cf'::text) Heap Fetches: 0 Buffers: shared read=2 Planning time: 0.250 ms Execution time: 0.096 ms (8 rows) my_db=# explain (analyze, buffers) select count(*) from tab where ID = '01625cfa-2bf8-45cf-bf4c-aa5f3c6fa8ea' ; QUERY PLAN ----------------------------------------------------------------------------------------------------------- Aggregate (cost=14.80..14.81 rows=1 width=0) (actual time=0.115..0.115 rows=1 loops=1) Buffers: shared hit=12 -> Seq Scan on tab (cost=0.00..14.79 rows=5 width=0) (actual time=0.031..0.108 rows=5 loops=1) Filter: ((ID)::text = '01625cfa-2bf8-45cf-bf4c-aa5f3c6fa8ea'::text) Rows Removed by Filter: 218 Buffers: shared hit=12 Planning time: 0.122 ms Execution time: 0.180 ms (8 rows)
The only difference I see is that for the long value the planner expects 5 rows, while for the short one it expects 1 row. That may seem a bit strange, but I'd bet it finds the short value in some statistic (MCV, histogram) ans so can provide very accurate estimate. While for the longer one, it ends up using some default (0.5% for equality IIRC) or value deduced from ndistinct. Or something like that.
The differences between the two plans are rather negligible, both in terms of costs (8.3 vs. 14.81) and runtime (0.1 vs 0.2 ms). The choice of a sequential scan seems perfectly reasonable for such tiny tables.
FWIW it's impossible to draw conclusions based on two EXPLAIN ANALYZE executions. The timing instrumentation from EXPLAIN ANALYZE may have significant impact impact (different for each plan!). You also need to testing with more values and longer runs, not just a single execution (there are caching effects etc.)
regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance