Re: Slow query because lexeme index not used

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Could you show the table stats for product.id ?  In particular its
"correlation".

 frac_mcv | tablename | attname | inherited | null_frac | n_distinct | n_mcv | n_hist | correlation
----------+-----------+---------+-----------+-----------+------------+-------+--------+-------------
          | product   | id      | f         |         0 |         -1 |       |    101 |   0.3857521

> How large is shared_buffers ?
256MB

> Does the query plan improve if you increase work_mem ?
No, same plan.

> Maybe you could encourage scanning in order of product_property.product...
Clustering "product_property_default" on "product_property_default_product_idx" followed by analyze, does not change the plan.

> Or you could write the query with a temp table:
Creating the temp table "product_ids" changes the plan to use lexeme_idx: https://explain.dalibo.com/plan/19h
But I prefer not to use an extra step for my query.

> Maybe it would help to create stats on "first" and "meaning"...
I've played around with statistics, and increasing column stats with the extended statistics improve the planner estimation (nested loop 69x to 12x), but the same ineffective plan is issued, without lexeme_idx:
https://explain.dalibo.com/plan/B7d#plan (has query with stats)




On Sunday, August 8, 2021, 3:35:31 AM GMT+3, Justin Pryzby <pryzby@xxxxxxxxxxxxx> wrote:


On Sat, Aug 07, 2021 at 07:35:25PM +0000, Alex wrote:
> Table "product" has a GIN index on "lexeme" column (tsvector) that is not used.
>
> Query that doesn't use lexeme idx:  https://explain.dalibo.com/plan/BlB#plan, ~8s, ~60.000 blocks needed
>
> Query forced to use lexeme idx: https://explain.dalibo.com/plan/i52, ~800ms (10x less), ~15.000 blocks needed (x4 less)

Could you show the table stats for product.id ?  In particular its
"correlation".

I guess the correlation is ~1, and the 10,659 index scans on product.id are
considered to be cheaper than scannning the lexeme index - since there are no
correlation stats for tsvector.

How large is shared_buffers ?

Does the query plan improve if you increase work_mem ?

Maybe you could encourage scanning in order of product_property.product.
You could CLUSTER product_property_default on an index on "product" and then
ANALYZE.  Or you could write the query with a temp table:

CREATE TEMP TABLE product_ids AS
SELECT product
FROM product_property
WHERE "meaning" = 'B' AND "first" = 1.7179869184E10
GROUP BY 1 -- or DISTINCT, because the table is only used for EXISTS
ORDER BY 1; -- to scan product in order of id
ANALYZE product_ids;

The index scans on product.id should be faster when you use
EXISTS(SELECT 1 FROM product_ids ...), even though it didn't use the lexeme index.

Maybe it would help to create stats on "first" and "meaning"; the rowcount is
underestimated by 3x, which means it did several times more index scans into
"product" than planned.
| Bitmap Heap Scan on product_property_default product_property_default (cost=2,748.6..8,823.4 rows=6,318 width=4) (actual time=43.945..211.621 rows=21,061 loops=1)

CREATE STATISTICS first_meaning ON first,meaning FROM product_property;
ANALYZE product_property;


> Table metdata:
>          relname          | relpages | reltuples | relallvisible | relkind | relnatts | relhassubclass | reloptions | pg_table_size
> --------------------------+----------+-----------+---------------+---------+----------+----------------+------------+---------------
>  product_property_default |     8992 |    622969 |          8992 | r       |       16 | f              |            |      73719808
>  product                  |    49686 |    413840 |         49686 | r       |       14 | f              |            |     493314048
>
> Table stats:
>    frac_mcv    |        tablename         | attname | inherited | null_frac | n_distinct  | n_mcv | n_hist | correlation
> ---------------+--------------------------+---------+-----------+-----------+-------------+-------+--------+-------------
>                | product                  | lexeme  | f         |         0 |          -1 |       |        |
>     0.99773335 | product_property_default | meaning | f         |         0 |          63 |    39 |     24 |  0.19444875
>      0.6416333 | product_property_default | first   | f         |         0 |        2193 |   100 |    101 | -0.09763639
>  0.00023333334 | product_property_default | product | f         |         0 | -0.15221785 |     1 |    101 |  0.08643274
>
>
> Using windows docker with wsl2.Both cases are run with cold cache.All database memory is limited to 1GB by using .wslconfig file with memory=1GB, also the docker container is limited to 1GB.
> My requirement is to optimize disk access with this limited memory

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux