Re: Forced to use UNION ALL when having multiple ANY operators and ORDER BY LIMIT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



So, had a bit more time to look into this.

Here is the issue:
Your query is requesting 20 rows.  However, you are doing a sort on sent_at.  Because of this, the database is having to pull all rows that match the status and sender_reference, sort them, and then give you 20.

From your example:
1.  You have 29744 rows that match your criteria.   ->
Index Scan using docs_sent_at_idx_1 on public.docs (cost=0.56..39748945.58 rows=29744 width=38)

2. To get those 29744 rows, the database had to read 5046 database blocks (8KB/block). 45046 * 8KB is your ~220MB of read. ->  
Buffers: shared hit=1421 read=45046

3. Since you are selecting all rows, the database uses the index to find the matching rows and then has to go read the head blocks to retrieve and validate the rows. Then it sorts all the returned rows by date and returns
the first 20 rows.

So, PostgreSQL is preforming well.  It’s just an expensive way to get 20 rows and I don’t see an easy way to make it better if that is what is needed.

Thanks,


Chris Hoover
Senior DBA
AWeber.com
Cell: (803) 528-2269
Email: chrish@xxxxxxxxxx



On Jun 12, 2023, at 5:34 PM, benoit <benoit@xxxxxxxxxxxxxxx> wrote:

This new index is used but still the read is 230mb.




De : Chris Hoover <chrish@xxxxxxxxxx>
Envoyé : lundi 12 juin 2023 22:55
À : benoit
Cc : pgsql-performance@xxxxxxxxxxxxxxxxxxxx
Objet : Re: Forced to use UNION ALL when having multiple ANY operators and ORDER BY LIMIT
 
I normally create my indexes to match the where clause of the query. While technically, it should not matter, I find a lot of time, it does.

I would create an index on (status, sender_reference, sent_at) and see if the improves your query performance.

SELECT * FROM docs WHERE status IN ('draft', 'sent') AND sender_reference IN ('Custom/1175', 'Client/362', 'Custom/280') ORDER BY sent_at DESC
Thanks,


Chris Hoover
Senior DBA
AWeber.com
Cell: (803) 528-2269
Email: chrish@xxxxxxxxxx



On Jun 12, 2023, at 4:17 PM, benoit <benoit@xxxxxxxxxxxxxxx> wrote:

Hello

I have a database with few 60gb tables. Tables rows are requested with multiple ANY or IN operators. I am not able to find an easy way to make DB able to use indexes. I often hit the index, but see a a spike of 200mb of IO or disk read.

I am using version 13 but soon 14.

I wrote a reproduction script on version 14 with plans included. https://gist.github.com/benoittgt/ab72dc4cfedea2a0c6a5ee809d16e04d

I also have plans on a snapshot of the DB with real data.
- The current query that I try to improve : https://explain.dalibo.com/plan/8b8f6e0he9feb551
  - I added the DB schema + index in query view. As you can see I have many indexes for testing purpose and try what the planner can do.
- The optimized query when I have only one ANY and migrate to UNION ALL for each parameter of the ANY operator https://explain.dalibo.com/plan/427gg053d07328ga . Query is fast as I would like but it means generate some merge to be able to get a fast result.
- The new issue I have when I have a new ANY operator on the previous optimized query. Big IO/read https://explain.dalibo.com/plan/e7ha9g637b4eh946

It seems to me quite undoable to generate for every parameters a query that will then merge. I have sometimes 3-4 ANY operators with up to 15 elements in an array.

Is there a misusage of my indexes?
Is there a limitation when using ANY or IN operators and ordered LIMIT behind?

Thanks a lot


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux