Re: Slow query plan used

Tom Lane <tgl@xxxxxxxxxxxxx> · Thu, 01 Jun 2017 12:41:42 -0400

Bill Moran <wmoran@xxxxxxxxxxxxxxxxx> writes:
> LIKE queries are probably challenging to plan, especially when they're not
> left-anchored: how can the planner be reasonalbly expected to estimate how
> many rows will be matched by a given LIKE expression.

Yeah, especially without any statistics.  The core problem here appears
to be the poor rowcount estimate for the LIKE:

              ->  Seq Scan on c3k_document_index  (cost=0.00..15160.48 rows=1063 width=285) (actual time=0.206..13539.353 rows=33022 loops=1)
                    Filter: ((lower((searchfield1)::text) ~~ '%sehr%'::text) OR (lower(searchfield8) ~~ '%sehr%'::text))
                    Rows Removed by Filter: 33832

That's off by a factor of 30.

> Not having looked at the code, I would guess that the length of the LIKE
> expression will make the planner assume that the match is more restrictive,
> while many % and _ in the LIKE expression make the planner assume that the
> match is less restrictive.

Exactly.  It's not terribly bright, but the extra %'s make it think that
the pattern is less selective, producing a higher rowcount estimate that
--- by chance -- better matches reality.  Then you get better decisions
about how to shape the rest of the plan.

You might get some traction by creating indexes on lower(searchfield1)
etc.  This isn't even necessarily with an expectation that the planner
would use those indexes in the plan ... but what it would do is make
use of the statistics that ANALYZE will accumulate about the indexed
expressions.  I think that would give you better estimates about the
LIKE rowcounts.  You might have to crank up the statistics target for
those indexes if the default isn't enough to make the estimates
significantly better.  (Obviously, don't forget to re-ANALYZE before
checking results.)

> That in mind, let me throw pg_trgm into the mix of things to try:
> https://www.postgresql.org/docs/current/static/pgtrgm.html
> The trigram module allows you to create indexes that LIKE can use
> to do index searches instead of always having to do sequential scans
> or push the LIKE matching to another part of the plan tree.

For cases like the above, where half the table is getting selected,
I would not expect the planner to use an index anyway.  Any old index
on lower(searchfieldN) will work in terms of inducing ANALYZE to
collect stats.  But if you have other queries with more selective
LIKE patterns then maybe a trigram index in particular is worthwhile.

			regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general