On 16/02/2023 17:15, Ron wrote:
On 2/16/23 09:47, cen wrote:
Hi,
I am running the same application (identical codebase) as two
separate instances to index (save) different sets of data. Both run
PostgreSQL 13.
The queries are the same but the content in actual databases is
different. One database is around 1TB and the other around 300GB.
There is a problem with a paginated select query with a join and an
order. Depending on which column you order by (primary or FK) the
query is either instant or takes minutes.
So on one database, ordering by primary is instant but on the other
it is slow and vice-versa. Switching the columns around on the slow
case fixes the issue.
All relavant colums are indexed.
What does EXPLAIN ANALYZE say?
I finally managed to get back to this and have a similar sample query.
In this case, changing the WHERE clause to use the joined table column
slows the query down.
The initial case was on the ORDER BY column but the simptoms are the
same I think.
I understand that even though both colums are indexed, the indexes are
completely different but the point is, how would one know in advance
which one will be faster when designing the query?
And as I mentioned in my initial text, the fast case of columns can
switch around as the database grows.
Fast case:
SELECT
t0."status",
b1."timestamp"
FROM
"transactions" AS t0
INNER JOIN
"blocks" AS b1
ON
b1."number" = t0."block_number"
WHERE
(((t0."to_address_hash" = '\x3012c'))
OR (t0."from_address_hash" = '\x3012c')
OR (t0."created_contract_address_hash" = '\x3012c'))
AND (t0."block_number" >= 30926000)
AND (t0."block_number" <= 31957494)
ORDER BY
t0."block_number" DESC
LIMIT
150
OFFSET
300;
Plan:
Limit (cost=15911.73..23367.09 rows=150 width=16) (actual
time=205.093..305.423 rows=150 loops=1)
-> Gather Merge (cost=1001.03..812143.50 rows=16320 width=16)
(actual time=36.140..305.333 rows=450 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop (cost=1.00..809259.75 rows=6800 width=16)
(actual time=2.662..155.779 rows=153 loops=3)
-> Parallel Index Scan Backward using
transactions_block_number_index on transactions t0 (cost=0.56..753566.08
rows=6800 width=8) (actual time=0.224..145.998 rows=153 loops=3)
Index Cond: ((block_number >= 30926000) AND
(block_number <= 31957494))
Filter: ((to_address_hash = '\x3012c'::bytea) OR
(from_address_hash = '\x3012c'::bytea) OR (created_contract_address_hash
= '\x3012c'::bytea))
Rows Removed by Filter: 22471
-> Index Scan using blocks_number_index on blocks b1
(cost=0.44..8.18 rows=1 width=16) (actual time=0.059..0.060 rows=1
loops=460)
Index Cond: (number = t0.block_number)
Planning Time: 0.513 ms
Execution Time: 305.541 ms
------------------------------
Slow case:
SELECT
t0."status",
b1."timestamp"
FROM
"transactions" AS t0
INNER JOIN
"blocks" AS b1
ON
b1."number" = t0."block_number"
WHERE
(((t0."to_address_hash" = '\x3012c'))
OR (t0."from_address_hash" = '\x3012c')
OR (t0."created_contract_address_hash" = '\x3012c'))
AND (b1."number" >= 30926000) -- using col from joined table instead
AND (b1."number" <= 31957494) -- using col from joined table instead
ORDER BY
t0."block_number" DESC
LIMIT
150
OFFSET
300;
Plan:
Limit (cost=1867319.63..1877754.02 rows=150 width=16) (actual
time=95830.704..95962.116 rows=150 loops=1)
-> Gather Merge (cost=1846450.83..2015348.94 rows=2428 width=16)
(actual time=95805.872..95962.075 rows=450 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Merge Join (cost=1845450.81..2014068.67 rows=1012
width=16) (actual time=95791.362..95824.633 rows=159 loops=3)
Merge Cond: (t0.block_number = b1.number)
-> Sort (cost=1845402.63..1845823.81 rows=168474
width=8) (actual time=95790.194..95790.270 rows=186 loops=3)
Sort Key: t0.block_number DESC
Sort Method: external merge Disk: 2496kB
Worker 0: Sort Method: external merge Disk: 2408kB
Worker 1: Sort Method: external merge Disk: 2424kB
-> Parallel Bitmap Heap Scan on transactions t0
(cost=39601.64..1828470.76 rows=168474 width=8) (actual
time=7274.149..95431.494 rows=137658 loops=3)
Recheck Cond: ((to_address_hash =
'\x3012c'::bytea) OR (from_address_hash = '\x3012c'::bytea) OR
(created_contract_address_hash = '\x3012c'::bytea))
Rows Removed by Index Recheck: 716205
Heap Blocks: exact=7043 lossy=134340
-> BitmapOr (cost=39601.64..39601.64
rows=404359 width=0) (actual time=7264.127..7264.130 rows=0 loops=1)
-> Bitmap Index Scan on
transactions_to_address_hash_recent_collated_index (cost=0.00..326.32
rows=3434 width=0) (actual time=2.314..2.314 rows=5 loops=1)
Index Cond: (to_address_hash =
'\x3012c'::bytea)
-> Bitmap Index Scan on
transactions_from_address_hash_recent_collated_index
(cost=0.00..38967.50 rows=400924 width=0) (actual
time=7261.750..7261.750 rows=419509 loops=1)
Index Cond: (from_address_hash =
'\x3012c'::bytea)
-> Bitmap Index Scan on
transactions_created_contract_address_hash_recent_collated_inde
(cost=0.00..4.57 rows=1 width=0) (actual time=0.059..0.060 rows=0 loops=1)
Index Cond:
(created_contract_address_hash = '\x3012c'::bytea)
-> Index Scan Backward using blocks_number_index on
blocks b1 (cost=0.44..167340.18 rows=189631 width=16) (actual
time=0.139..26.964 rows=27612 loops=3)
Index Cond: ((number >= 30926000) AND (number <=
31957494))
Planning Time: 0.736 ms
Execution Time: 95963.436 ms
Since you've run ANALYZE, when were the tables last vacuumed?
Last autovacuum for transactions: 2023-02-23
Last autovacuum for blocks: 2023-01-23