Re: plan using BTree VS GIN

Laurenz Albe <laurenz.albe@xxxxxxxxxxx> · Fri, 23 Jun 2023 14:52:57 +0200

On Fri, 2023-06-23 at 12:08 +0000, Nicolas Seinlet wrote:
> we faced an issue with a select query on a relatively large table on our database.
> The query involves one single table. The table has more than 10 million records.
> It's mainly composed of varchar fields, have a primary key (id) of type serial,
> and when records of this table are shown to users, they are sorted users 2 fields,
> display_name (varchar) and id (the primary key). Because this table is heavily used
> in various contexts in our application, we have multiple indexes on it. Among other
> index, we have gin index on some fields of the table.
> 
> The btree index res_partner_displayname_id_idx have been added lately and perfectly
> match a criteria (where active) and sorting (display_name, id) we have in quite all
> our queries on this table.
> 
> The query that cause the issue is this one:
> SELECT "res_partner"."id"
>   FROM "res_partner"
>  WHERE (("res_partner"."active" = true) AND
>          (
>          (
>            (
>              ((unaccent("res_partner"."display_name"::text) ilike unaccent('%nse%'))
>            OR (unaccent("res_partner"."email"::text) ilike unaccent('%nse%')))
>         OR (unaccent("res_partner"."ref"::text) ilike unaccent('%nse)%')))
>      OR (unaccent("res_partner"."vat"::text) ilike unaccent('%nse%')))
>    OR (unaccent("res_partner"."company_registry"::text) ilike unaccent('%nse)%'))))
> 
>  AND ((("res_partner"."type" != 'private') OR "res_partner"."type" IS NULL) OR "res_partner"."type" IS NULL )
> 
> ORDER BY "res_partner"."display_name" ,"res_partner"."id"  
>    LIMIT 100
> 
> We have the common criteria  (active=true), the common sorting, a limit, and a search
> on various fields. The fields on which we're searching with criteria like '% whatever%' are gin indexed.
> 
>  Here is the query plan:
>  Limit  (cost=0.56..10703.36 rows=100 width=25) (actual time=56383.794..86509.036 rows=1 loops=1)
>    Output: id, display_name
>    Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247
>    ->  Index Scan using res_partner_displayname_id_idx on public.res_partner  (cost=0.56..1200212.37 rows=11214 width=25) (actual time=56383.793..86509.022 rows=1 loops=1)
>          Output: id, display_name
>          Filter: ((((res_partner.type)::text <> 'private'::text) OR (res_partner.type IS NULL) OR (res_partner.type IS NULL)) AND ((unaccent((res_partner.display_name)::text) ~~* '%nse%'::text) OR
> (unaccent((res_partner.email)::text) ~~
> * '%nse%'::text) OR (unaccent((res_partner.ref)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.vat)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.company_registry)::text) ~~*
> '%nse%'::text)))
>          Rows Removed by Filter: 6226870
>          Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247
>  Planning Time: 0.891 ms
>  Execution Time: 86509.070 ms
> (10 rows)
> 
> It's not using our gin index at all, but the btree one.

The problem is that PostgreSQL estimates that the index scan will return 11214
rows, when it is actually one.  This makes the plan to scan the table using
an index that matches the ORDER BY clause appealing: we might find 100 rows
quickly and avoid a sort.

You can try to improve the estimates with more detailed statistics,
but if that doesn't do the job, you can modify the ORDER BY clause so
that it cannot use the bad index:

  ORDER BY res_partner.display_name ,res_partner.id + 0

Yours,
Laurenz Albe