Re: CTE with JOIN of two tables is much faster than a regular query

Stephen Frost <sfrost@xxxxxxxxxxx> · Sat, 18 Aug 2018 10:38:48 -0400

Greetings,

* kpi6288@xxxxxxxxx (kpi6288@xxxxxxxxx) wrote:
> The CTE mentioned below completes the query in 4.5 seconds while the regular
> query takes 66 seconds. I read from EXPLAIN ANALYSE that the regular query
> starts with a full table scan over "Doc" while the CTE joins the two tables
> first and applies the filter condition in the 2nd step. 
> 
> I believe that some rows in "Doc" which are not referenced by "F" contain a
> large amount of data in the field "szText" and this will slow down the ILIKE
> operator. 

Yup, that appears to be what's happening.

> What can I do to improve the performance of the regular query without using
> a CTE? 

You could possibly build a trigram index on the field you're searching,
which could avoid the full table scan.  Of course, that index could be
quite large, so there's downsides to that.  If these are words you're
looking for then you could use PG's full text indexing to build indexes
on the words and then use that instead.  If you are fine working with
words but are concerned about misspellings then you can extract out the
distinct words, build a trigram index on those, find the most similar
words based on the input and then search for those words using the FTI.

Unfortunately, we don't currently pay attention to things like average
string length when considering the cost of performing an 'ilike', so we
figure that doing the filtering first and then the join will be faster,
but that obviously falls over in some cases, like this one.  Using the
CTE forces PG to (today, at least) do the join first, but that isn't
really good to rely on.

Thanks!

Stephen
Attachment:
signature.asc

Description: PGP signature