Re: Number of characters in column preventing index usage

Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> · Fri, 17 Feb 2017 23:49:29 +0100

Hi,

On 02/17/2017 11:19 PM, Hustler DBA wrote:
I am seeing this strange behavior, I don't know if this is by design by
Postgres.

I have an index on a column which is defined as "character
varying(255)". When the value I am searching for is of a certain length,
the optimizer uses the index but when the value is long, the optimizer
doesn't use the index but does a seq scan on the table. Is this by
design? How can I make the optimizer use the index no matter what the
size/length of the value being searched for?

AFAIK there are no such checks, i.e. the optimizer does not consider the 
length of the value when deciding between scan types.

PostgreSQL version: 9.4

That's good to know, but we also need information about the table 
involved in your queries. I'd bet the table is tiny (it seems to be just 
12 pages, so ~100kB), making the indexes rather useless.

my_db=# explain (analyze, buffers) select count(*) from tab where ID =
'01625cfa-2bf8-45cf' ;
                                                              QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=8.29..8.30 rows=1 width=0) (actual time=0.048..0.048
rows=1 loops=1)
   Buffers: shared read=2
   ->  Index Only Scan using tab_idx1 on tab  (cost=0.27..8.29 rows=1
width=0) (actual time=0.043..0.043 rows=0 loops=1)
         Index Cond: (ID = '01625cfa-2bf8-45cf'::text)
         Heap Fetches: 0
         Buffers: shared read=2
 Planning time: 0.250 ms
 Execution time: 0.096 ms
(8 rows)

my_db=# explain (analyze, buffers) select count(*) from tab where ID =
'01625cfa-2bf8-45cf-bf4c-aa5f3c6fa8ea' ;
                                                QUERY PLAN

-----------------------------------------------------------------------------------------------------------
 Aggregate  (cost=14.80..14.81 rows=1 width=0) (actual time=0.115..0.115
rows=1 loops=1)
   Buffers: shared hit=12
   ->  Seq Scan on tab  (cost=0.00..14.79 rows=5 width=0) (actual
time=0.031..0.108 rows=5 loops=1)
         Filter: ((ID)::text = '01625cfa-2bf8-45cf-bf4c-aa5f3c6fa8ea'::text)
         Rows Removed by Filter: 218
         Buffers: shared hit=12
 Planning time: 0.122 ms
 Execution time: 0.180 ms
(8 rows)

The only difference I see is that for the long value the planner expects 
5 rows, while for the short one it expects 1 row. That may seem a bit 
strange, but I'd bet it finds the short value in some statistic (MCV, 
histogram) ans so can provide very accurate estimate. While for the 
longer one, it ends up using some default (0.5% for equality IIRC) or 
value deduced from ndistinct. Or something like that.

The differences between the two plans are rather negligible, both in 
terms of costs (8.3 vs. 14.81) and runtime (0.1 vs 0.2 ms). The choice 
of a sequential scan seems perfectly reasonable for such tiny tables.

FWIW it's impossible to draw conclusions based on two EXPLAIN ANALYZE 
executions. The timing instrumentation from EXPLAIN ANALYZE may have 
significant impact impact (different for each plan!). You also need to 
testing with more values and longer runs, not just a single execution 
(there are caching effects etc.)

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance