plan using BTree VS GIN

Nicolas Seinlet <nicolas@xxxxxxxxxxx> · Fri, 23 Jun 2023 12:08:38 +0000

Hello,

we faced an issue with a select query on a relatively large table on our database. The query involves one single table. The table has more than 10 million records. It's mainly composed of varchar fields, have a primary key (id) of type serial, and when records of this table are shown to users, they are sorted users 2 fields, display_name (varchar) and id (the primary key). Because this table is heavily used in various contexts in our application, we have multiple indexes on it. Among other index, we have gin index on some fields of the table.

Among other things, we're using unaccent. We are aware the unaccent function is mutable, but we have an immutable version of unaccent.

the table is similar to (I can give you all the fields of the table if needed):
                                                    Table "public.res_partner"
              Column               |            Type             | Collation | Nullable |                 Default                
-----------------------------------+-----------------------------+-----------+----------+-----------------------------------------
 id                                | integer                     |           | not null | nextval('res_partner_id_seq'::regclass)
 active                            | boolean                     |           |          |
 name                              | character varying           |           |          |
 display_name                      | character varying           |           |          |
 ref                               | character varying           |           |          |
 email                             | character varying           |           |          |
 vat                               | character varying           |           |          |
 type                              | character varying           |           |          |
 company_registry                  | character varying           |           |          |

Gin Index:
"res_partner_unaccent_tgm_ref" gin (unaccent(ref::text) gin_trgm_ops) WHERE ref IS NOT NULL
"res_partner_unaccent_tgm_vat" gin (unaccent(vat::text) gin_trgm_ops) WHERE vat IS NOT NULL
"res_partner_unaccent_tgm_idx_gin2" gin (unaccent(name::text) gin_trgm_ops, unaccent(display_name::text) gin_trgm_ops, unaccent(ref::text) gin_trgm_ops, unaccent(email::text) gin_trgm_ops, unaccent(vat::text) gin_trgm_ops)
"res_partner_name_tgm_idx_gin" gin (name gin_trgm_ops, display_name gin_trgm_ops, ref gin_trgm_ops, email gin_trgm_ops, vat gin_trgm_ops)
"res_partner_unaccent_tgm_display_namee" gin (unaccent(display_name::text) gin_trgm_ops)
"res_partner_unaccent_tgm_email" gin (unaccent(email::text) gin_trgm_ops) WHERE email IS NOT NULL
"res_partner_comp_reg_idx3" gin (unaccent(company_registry::text) gin_trgm_ops) WHERE company_registry IS NOT NULL

BTree index:
"res_partner_displayname_id_idx" btree (display_name, id) WHERE active
"res_partner_comp_reg_idx2" btree (unaccent(company_registry::text)) WHERE company_registry IS NOT NULL

The btree index res_partner_displayname_id_idx have been added lately and perfectly match a criteria (where active) and sorting (display_name, id) we have in quite all our queries on this table.

The query that cause the issue is this one:
SELECT "res_partner"."id"
  FROM "res_partner"
 WHERE (("res_partner"."active" = true) AND
         (
         (
           (
             ((unaccent("res_partner"."display_name"::text) ilike unaccent('%nse%'))
           OR (unaccent("res_partner"."email"::text) ilike unaccent('%nse%')))
        OR (unaccent("res_partner"."ref"::text) ilike unaccent('%nse)%')))
     OR (unaccent("res_partner"."vat"::text) ilike unaccent('%nse%')))
   OR (unaccent("res_partner"."company_registry"::text) ilike unaccent('%nse)%'))))

 AND ((("res_partner"."type" != 'private') OR "res_partner"."type" IS NULL) OR "res_partner"."type" IS NULL )

ORDER BY "res_partner"."display_name" ,"res_partner"."id"  
   LIMIT 100

We have the common criteria  (active=true), the common sorting, a limit, and a search on various fields. The fields on which we're searching with criteria like '% whatever%' are gin indexed.

 Here is the query plan:
 Limit  (cost=0.56..10703.36 rows=100 width=25) (actual time=56383.794..86509.036 rows=1 loops=1)
   Output: id, display_name
   Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247
   ->  Index Scan using res_partner_displayname_id_idx on public.res_partner  (cost=0.56..1200212.37 rows=11214 width=25) (actual time=56383.793..86509.022 rows=1 loops=1)
         Output: id, display_name
         Filter: ((((res_partner.type)::text <> 'private'::text) OR (res_partner.type IS NULL) OR (res_partner.type IS NULL)) AND ((unaccent((res_partner.display_name)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.email)::text) ~~
* '%nse%'::text) OR (unaccent((res_partner.ref)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.vat)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.company_registry)::text) ~~* '%nse%'::text)))
         Rows Removed by Filter: 6226870
         Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247
 Planning Time: 0.891 ms
 Execution Time: 86509.070 ms
(10 rows)

It's not using our gin index at all, but the btree one.

If we change some things, like:
   remove criteria on company_registry field
or limit=10k
or longer string than 3 characters, like '%full name of partner%'
or sometimes, but not every time and for a limited period of time, analyze the table
or set enable_indexscan=FALSE

the plan switch to:
 Limit  (cost=3061.03..3061.28 rows=100 width=25) (actual time=496.092..496.095 rows=1 loops=1)
   Output: id, display_name
   Buffers: shared hit=4527 read=2791
   ->  Sort  (cost=3061.03..3065.70 rows=1868 width=25) (actual time=496.091..496.094 rows=1 loops=1)
         Output: id, display_name
         Sort Key: res_partner.display_name, res_partner.id
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=4527 read=2791
         ->  Bitmap Heap Scan on public.res_partner  (cost=958.14..2989.64 rows=1868 width=25) (actual time=496.050..496.053 rows=1 loops=1)
               Output: id, display_name
               Recheck Cond: ((unaccent((res_partner.display_name)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.email)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.ref)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.vat)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.company_registry)::text) ~~* '%nse%'::text))
               Filter: (res_partner.active AND (((res_partner.type)::text <> 'private'::text) OR (res_partner.type IS NULL) OR (res_partner.type IS NULL)))
               Heap Blocks: exact=1
               Buffers: shared hit=4521 read=2791
               ->  BitmapOr  (cost=958.14..958.14 rows=1960 width=0) (actual time=496.020..496.022 rows=0 loops=1)
                     Buffers: shared hit=4520 read=2791
                     ->  Bitmap Index Scan on res_partner_unaccent_tgm_idx_gin2  (cost=0.00..230.83 rows=645 width=0) (actual time=228.725..228.725 rows=1 loops=1)
                           Index Cond: (unaccent((res_partner.display_name)::text) ~~* '%nse%'::text)
                           Buffers: shared hit=2169 read=1374
                     ->  Bitmap Index Scan on res_partner_unaccent_tgm_idx_gin2  (cost=0.00..230.81 rows=642 width=0) (actual time=256.083..256.083 rows=0 loops=1)
                           Index Cond: (unaccent((res_partner.email)::text) ~~* '%nse%'::text)
                           Buffers: shared hit=1906 read=1348
                     ->  Bitmap Index Scan on res_partner_unaccent_tgm_idx_gin2  (cost=0.00..226.00 rows=1 width=0) (actual time=2.693..2.693 rows=0 loops=1)
                           Index Cond: (unaccent((res_partner.ref)::text) ~~* '%nse%'::text)
                           Buffers: shared hit=178 read=7
                     ->  Bitmap Index Scan on res_partner_unaccent_tgm_idx_gin2  (cost=0.00..226.14 rows=19 width=0) (actual time=8.414..8.415 rows=0 loops=1)
                           Index Cond: (unaccent((res_partner.vat)::text) ~~* '%nse%'::text)
                           Buffers: shared hit=225 read=62
                     ->  Bitmap Index Scan on res_partner_comp_reg_idx3  (cost=0.00..42.01 rows=654 width=0) (actual time=0.099..0.099 rows=0 loops=1)
                           Index Cond: (unaccent((res_partner.company_registry)::text) ~~* '%nse%'::text)
                           Buffers: shared hit=42
 Planning Time: 20.921 ms
 Execution Time: 496.296 ms
(33 rows)

How can we have PostgreSQL choose the second query plan? If we look at the expected VS actual rows, statistics look quite accurate.

Thanks for reading,

Nicolas.

Attachment:
publickey - nicolas@seinlet.com - 0xCAEB7FAF.asc

Description: application/pgp-keys
Attachment:
signature.asc

Description: OpenPGP digital signature