Re: Number of characters in column preventing index usage

Tom Lane <tgl@xxxxxxxxxxxxx> · Fri, 17 Feb 2017 19:04:41 -0500

"David G. Johnston" <david.g.johnston@xxxxxxxxx> writes:
> On Fri, Feb 17, 2017 at 3:49 PM, Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx>
> wrote:
>> That may seem a bit strange, but I'd bet it finds the short value in some
>> statistic (MCV, histogram) ans so can provide very accurate estimate.

> I'm not seeing how any of the statistic columns would capture a value that
> doesn't actually appear in the table...(actual ... row=0)

I think it's the other way around.  It found
'01625cfa-2bf8-45cf-bf4c-aa5f3c6fa8ea' in the stats, concluded
(accurately) that there would be five matches, and on the strength of that
decided that a seqscan over this very tiny table would be faster than an
indexscan.  In the other case, the short string exists neither in the
table nor the stats, and the default estimate is turning out to be that
there's a single match, for which it likes the indexscan solution.  This
is all pretty unsurprising if '01625cfa-2bf8-45cf-bf4c-aa5f3c6fa8ea'
is in the most-common-values list.  Anything that's *not* in that list
is going to get a smaller rowcount estimate.  (I don't think that the
string length, per se, has anything to do with it.)

I'm not sure what performance problem the OP was looking to solve,
but expecting experiments on toy-sized tables to give the same plans
as you get on large tables is a standard mistake when learning to work
with the PG planner.

Also, if toy-sized tables are all you've got, meaning the whole database
can be expected to stay RAM-resident at all times, it'd be a good idea
to reduce random_page_cost to reflect that.  The default planner cost
settings are meant for data that's mostly on spinning rust.

			regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance