Re: How to Optimize pg_trgm Performance

"Ivan E. Panchenko" <i.panchenko@xxxxxxxxxxxxxx> · Wed, 31 Jan 2018 01:50:46 +0300



    Hi Igal,
    29.01.2018 02:42, Igal @ Lucee.org пишет:

    
      I want to use pg_trgm for auto-suggest functionality.  I
        created a Materialized View with the information that I need,
        with the relevant columns being (keywords text, rank int). 
        keywords is the column from which I build the tri-grams, and
        rank is some popularity factor so that popular results will show
        up higher than less popular results given the same tri-gram
        distance.

      
      I want to return results in the order of [distance],
        [distance_word], [rank].  The input comes from the user and is
        not known in advance.  My query is as follows:

      
          SELECT title

                ,id

                ,(input <-> keywords) AS distance

                ,(input <<-> keywords) AS
          distance_word

                ,rank

            FROM  (VALUES (cast('red pill' AS text)))
          consts(input)

                ,mv_autosuggest

            ORDER BY 3, 4, 5

              LIMIT 20;

      
      This gives me pretty good results, but it takes too long and is
        not likely to scale well.
      I have created two indexes but neither seem to be used:
      CREATE INDEX mv_autosuggest_keywords_tgrm_gist ON
          staging.mv_autosuggest USING gist (keywords gist_trgm_ops);

        
        CREATE INDEX mv_autosuggest_keywords_tgrm_gin ON
          staging.mv_autosuggest USING gin (keywords gin_trgm_ops);
      This is the result of explain analyze:

        
      QUERY
PLAN                                                                                                                                
          |

-------------------------------------------------------------------------------------------------------------------------------------------|

          Limit  (cost=356.41..356.46 rows=20 width=51) (actual
          time=163.132..163.135 rows=20
          loops=1)                                               |

            ->  Sort  (cost=356.41..372.96 rows=6619 width=51)
          (actual time=163.130..163.131 rows=20
          loops=1)                                        |

                  Sort Key: (('red pill'::text <->
          mv_autosuggest.keywords)), (('red pill'::text <<->
          mv_autosuggest.keywords)), mv_autosuggest.rank |

                  Sort Method: top-N heapsort  Memory:
28kB                                                                                         
          |

                  ->  Seq Scan on mv_autosuggest  (cost=0.00..180.29
          rows=6619 width=51) (actual time=0.263..161.289 rows=6619
          loops=1)              |

          Planning time: 0.139
ms                                                                                                                   
          |

          Execution time: 163.174
ms                                                                                                                
          |

      
      How can I improve the performance here?

      
    This can be improved if you use sort only by distances

    (try ORDER BY 3,4 

    or ORDER BY 3

    or ORDER BY 4

    
    Than you should get  plan  like 

    
    Index Scan using mv_autosuggest_keywords_tgrm_gist on
      mv_autosuggest  

             Order By: ((keywords <-> 'red pill'::text)
      AND (keywords <->> 'red pill'::text))

      
    Which means that KNN  index search is enabled :
    http://www.sai.msu.su/~megera/postgres/talks/pgcon-2010-1.pdf

    
    If you want to sort it also by rank, you can make a two-level
    construction like:

    
    SELECT * FROM (

       SELECT .... ORDER BY 3,4 LIMIT ....  /* make some empirical
    redundant limit here */

    ) foo ORDER BY 5 LIMIT ....

    
    Regards,

    Ivan Panchenko

    
      Thank you,

        
        Igal Sapir 

          Lucee Core Developer 

          Lucee.org