Re: plan using BTree VS GIN

Nicolas Seinlet <nicolas@xxxxxxxxxxx> · Mon, 26 Jun 2023 06:35:14 +0000

On Friday, June 23rd, 2023 at 2:52 PM, Laurenz Albe <laurenz.albe@xxxxxxxxxxx> wrote:

> 

> 

> On Fri, 2023-06-23 at 12:08 +0000, Nicolas Seinlet wrote:
> 

> > we faced an issue with a select query on a relatively large table on our database.
> > The query involves one single table. The table has more than 10 million records.
> > It's mainly composed of varchar fields, have a primary key (id) of type serial,
> > and when records of this table are shown to users, they are sorted users 2 fields,
> > display_name (varchar) and id (the primary key). Because this table is heavily used
> > in various contexts in our application, we have multiple indexes on it. Among other
> > index, we have gin index on some fields of the table.
> > 

> > The btree index res_partner_displayname_id_idx have been added lately and perfectly
> > match a criteria (where active) and sorting (display_name, id) we have in quite all
> > our queries on this table.
> > 

> > The query that cause the issue is this one:
> > SELECT "res_partner"."id"
> > FROM "res_partner"
> > WHERE (("res_partner"."active" = true) AND
> > (
> > (
> > (
> > ((unaccent("res_partner"."display_name"::text) ilike unaccent('%nse%'))
> > OR (unaccent("res_partner"."email"::text) ilike unaccent('%nse%')))
> > OR (unaccent("res_partner"."ref"::text) ilike unaccent('%nse)%')))
> > OR (unaccent("res_partner"."vat"::text) ilike unaccent('%nse%')))
> > OR (unaccent("res_partner"."company_registry"::text) ilike unaccent('%nse)%'))))
> > 

> > AND ((("res_partner"."type" != 'private') OR "res_partner"."type" IS NULL) OR "res_partner"."type" IS NULL )
> > 

> > ORDER BY "res_partner"."display_name" ,"res_partner"."id"
> > LIMIT 100
> > 

> > We have the common criteria (active=true), the common sorting, a limit, and a search
> > on various fields. The fields on which we're searching with criteria like '% whatever%' are gin indexed.
> > 

> > Here is the query plan:
> > Limit (cost=0.56..10703.36 rows=100 width=25) (actual time=56383.794..86509.036 rows=1 loops=1)
> > Output: id, display_name
> > Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247
> > -> Index Scan using res_partner_displayname_id_idx on public.res_partner (cost=0.56..1200212.37 rows=11214 width=25) (actual time=56383.793..86509.022 rows=1 loops=1)
> > Output: id, display_name
> > Filter: ((((res_partner.type)::text <> 'private'::text) OR (res_partner.type IS NULL) OR (res_partner.type IS NULL)) AND ((unaccent((res_partner.display_name)::text) ~~* '%nse%'::text) OR
> > (unaccent((res_partner.email)::text) ~~
> > * '%nse%'::text) OR (unaccent((res_partner.ref)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.vat)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.company_registry)::text) ~~*
> > '%nse%'::text)))
> > Rows Removed by Filter: 6226870
> > Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247
> > Planning Time: 0.891 ms
> > Execution Time: 86509.070 ms
> > (10 rows)
> > 

> > It's not using our gin index at all, but the btree one.
> 

> 

> The problem is that PostgreSQL estimates that the index scan will return 11214
> rows, when it is actually one. This makes the plan to scan the table using
> an index that matches the ORDER BY clause appealing: we might find 100 rows
> quickly and avoid a sort.
> 

> You can try to improve the estimates with more detailed statistics,
> but if that doesn't do the job, you can modify the ORDER BY clause so
> that it cannot use the bad index:
> 

> ORDER BY res_partner.display_name ,res_partner.id + 0
> 

> Yours,
> Laurenz Albe

Hello,

First of all, thanks, this solves the issue for the given query.

Some more questions then,
> we might find 100 rows quickly
The cost estimate for 11214 rows is 1200212.37
If I look at the other plan, none of the estimated cost reach such levels (~2k for indexes + 1k for the BitmapOr, 3k for Bitmap Heap Scan, and finally 1k for sort and limit), roughly 7k

And that's part of what I didn't understand. How is the first cost estimated? If we divide by 110 the cost to go from 11k records to 100, it's still ~10k, more than the other plan.

Thanks again,

Nicolas.

Attachment:
publickey - nicolas@seinlet.com - 0xCAEB7FAF.asc

Description: application/pgp-keys
Attachment:
signature.asc

Description: OpenPGP digital signature