Re: [pg_trgm] Making similarity(?, ?) < ? use an index

Artur Zakirov <a.zakirov@xxxxxxxxxxxxxx> · Fri, 3 Jun 2016 14:58:24 +0300

Oh, I understand. It is because you want different limits for 
restaurants and cinemas?

I see only one solution. It is custom extension, which will create 
operator class similar to gin_trgm_ops and will depends on pg_trgm. In 
gin_trgm_consistent() you can use your own limit variable.

As I know functions do not use indexes.

Of course I may be wrong. And somebody knows a better solution.

On 03.06.2016 14:24, Greg Navis wrote:
Artur, thanks for your reply. That's right, `%` does use the index. The
goal of using `similarity(lhs, rhs) >= show_limit()` was to replace
`show_limit()` with a custom, per-query limit. I noticed that the latter
approach does _not_ use the index, hence my question:

grn=# EXPLAIN ANALYZE SELECT * FROM restaurants WHERE city % 'warsw';
                                                                  QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on restaurants  (cost=24.28..1319.36 rows=515
width=10) (actual time=96.081..96.456 rows=400 loops=1)
   Recheck Cond: ((city)::text % 'warsw'::text)
   Heap Blocks: exact=359
   ->  Bitmap Index Scan on restaurants_city_gist_trgm_idx
 (cost=0.00..24.15 rows=515 width=0) (actual time=96.030..96.030
rows=400 loops=1)
         Index Cond: ((city)::text % 'warsw'::text)
 Planning time: 0.211 ms
 Execution time: 96.528 ms
(7 rows)

grn=# EXPLAIN ANALYZE SELECT * FROM restaurants WHERE similarity(city,
'warsw') >= show_limit();
                                                     QUERY PLAN

--------------------------------------------------------------------------------------------------------------------
 Seq Scan on restaurants  (cost=0.00..11692.81 rows=171825 width=10)
(actual time=14.520..692.520 rows=400 loops=1)
   Filter: (similarity((city)::text, 'warsw'::text) >= show_limit())
   Rows Removed by Filter: 515075
 Planning time: 0.109 ms
 Execution time: 692.560 ms
(5 rows)

If this functionality isn't supported then it might be a good idea for a
contribution.

Best regards

On Fri, Jun 3, 2016 at 12:51 PM, Artur Zakirov <a.zakirov@xxxxxxxxxxxxxx
<mailto:a.zakirov@xxxxxxxxxxxxxx>> wrote:

    Hello.

    As I know 'lhs % rhs' is equivalent to 'similarity(lhs, rhs) >=
    show_limit()'.

    And so your query should looks like this:

    SELECT * FROM restaurants WHERE city % 'warsw';

    And it should use index.

    On 03.06.2016 13:35, Greg Navis wrote:

        Hey!

        I'm playing with pg_trgm. It seems that `lhs % rhs` is _almost_
        equivalent to `similarity(lhs, rhs) < show_limit()`. The
        difference that
        I noticed is that `%` uses a GIN index while `similarity` does not.

        ```
        grn=# \d restaurants
                 Table "public.restaurants"
         Column |          Type          | Modifiers
        --------+------------------------+-----------
         city   | character varying(255) | not null
        Indexes:
            "restaurants_city_trgm_idx" gin (city gin_trgm_ops)

        grn=# SELECT COUNT(*) FROM restaurants;
         count
        --------
         515475
        (1 row)

        Time: 45.964 ms
        grn=# EXPLAIN ANALYZE SELECT * FROM restaurants WHERE
        similarity(city,
        'warsw') > show_limit();
                                                             QUERY PLAN

        --------------------------------------------------------------------------------------------------------------------
         Seq Scan on restaurants  (cost=0.00..11692.81 rows=171825 width=10)
        (actual time=16.436..665.062 rows=360 loops=1)
           Filter: (similarity((city)::text, 'warsw'::text) > show_limit())
           Rows Removed by Filter: 515115
         Planning time: 0.139 ms
         Execution time: 665.105 ms
        (5 rows)

        Time: 665.758 ms
        ```

        My question is: is it possible to make `similarity` use the
        index? If
        not, is there a way to speed up the query above?

        Best regards
        --
        Greg Navis
        I help tech companies to scale Heroku-hosted Rails apps.
        Free, biweekly scalability newsletter for SaaS CEOs
        <http://www.gregnavis.com/newsletter/>

    --
    Artur Zakirov
    Postgres Professional: http://www.postgrespro.com
    Russian Postgres Company

--
Greg Navis
I help tech companies to scale Heroku-hosted Rails apps.
Free, biweekly scalability newsletter for SaaS CEOs
<http://www.gregnavis.com/newsletter/>

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general