Hash Anti Join performance degradation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

In my DB, there is a large table containing messages and one that contains
message boxes.
Messages are assigned to boxes via a child parent link m->b.
In order to obtain the last message for a specific box, I use the following
SQL:

SELECT m1.id FROM message m1 LEFT JOIN message m2 ON (m1.box_id = m2.box_id
AND m1.id < m2.id) WHERE m2.id IS NULL AND m1.box_id = id;

This worked quite well for a long time. But now, suddenly the performance of
this query drastically degraded as new messages were added.
If these new messages are removed again, everything is back to normal. If
other messages are removed instead, the problem remains, so it does not seem
to be a memory issue. I fear I have difficulties to understand what is going
wrong here.

This is the query plan when everything is fine:

"Seq Scan on public.box this_  (cost=0.00..10467236.32 rows=128 width=696)
(actual time=0.169..7683.978 rows=128 loops=1)"
"  Output: this_.id, this_.login, (SubPlan 1)"
"  Buffers: shared hit=188413 read=94635 written=135, temp read=22530
written=22374"
"  SubPlan 1"
"    ->  Hash Anti Join  (cost=41323.25..81775.25 rows=20427 width=8)
(actual time=59.571..59.877 rows=1 loops=128)"
"          Output: m1.id"
"          Hash Cond: (m1.box_id = m2.box_id)"
"          Join Filter: (m1.id < m2.id)"
"          Buffers: shared hit=188412 read=94633 written=135, temp
read=22530 written=22374"
"          ->  Bitmap Heap Scan on public.message m1  (cost=577.97..40212.28
rows=30640 width=16) (actual time=3.152..9.514 rows=17982 loops=128)"
"                Output: m1.id, m1.box_id"
"                Recheck Cond: (m1.box_id = $0)"
"                Buffers: shared hit=131993 read=9550 written=23"
"                ->  Bitmap Index Scan on "message_box_Idx" 
(cost=0.00..570.31 rows=30640 width=0) (actual time=2.840..2.840 rows=18193
loops=128)"
"                      Index Cond: (m1.box_id = $0)"
"                      Buffers: shared hit=314 read=6433 written=23"
"          ->  Hash  (cost=40212.28..40212.28 rows=30640 width=16) (actual
time=26.840..26.840 rows=20014 loops=115)"
"                Output: m2.box_id, m2.id"
"                Buckets: 4096  Batches: 4 (originally 2)  Memory Usage:
5444kB"
"                Buffers: shared hit=56419 read=85083 written=112, temp
written=7767"
"                ->  Bitmap Heap Scan on public.message m2 
(cost=577.97..40212.28 rows=30640 width=16) (actual time=2.419..20.007
rows=20014 loops=115)"
"                      Output: m2.box_id, m2.id"
"                      Recheck Cond: (m2.box_id = $0)"
"                      Buffers: shared hit=56419 read=85083 written=112"
"                      ->  Bitmap Index Scan on "message_box_Idx" 
(cost=0.00..570.31 rows=30640 width=0) (actual time=2.166..2.166 rows=20249
loops=115)"
"                            Index Cond: (m2.box_id = $0)"
"                            Buffers: shared hit=6708"
"Total runtime: 7685.202 ms"

This is the plan when the query gets sluggish:

"Seq Scan on public.box this_  (cost=0.00..10467236.32 rows=128 width=696)
(actual time=0.262..179333.086 rows=128 loops=1)"
"  Output: this_.id, this_.login, (SubPlan 1)"
"  Buffers: shared hit=189065 read=93983 written=10, temp read=22668
written=22512"
"  SubPlan 1"
"    ->  Hash Anti Join  (cost=41323.25..81775.25 rows=20427 width=8)
(actual time=1264.700..1400.886 rows=1 loops=128)"
"          Output: m1.id"
"          Hash Cond: (m1.box_id = m2.box_id)"
"          Join Filter: (m1.id < m2.id)"
"          Buffers: shared hit=189064 read=93981 written=10, temp read=22668
written=22512"
"          ->  Bitmap Heap Scan on public.message m1  (cost=577.97..40212.28
rows=30640 width=16) (actual time=3.109..9.850 rows=18060 loops=128)"
"                Output: m1.id, m1.box_id"
"                Recheck Cond: (m1.box_id = $0)"
"                Buffers: shared hit=132095 read=9448"
"                ->  Bitmap Index Scan on "message_box_Idx" 
(cost=0.00..570.31 rows=30640 width=0) (actual time=2.867..2.867 rows=18193
loops=128)"
"                      Index Cond: (m1.box_id = $0)"
"                      Buffers: shared hit=312 read=6435"
"          ->  Hash  (cost=40212.28..40212.28 rows=30640 width=16) (actual
time=27.533..27.533 rows=20102 loops=115)"
"                Output: m2.box_id, m2.id"
"                Buckets: 4096  Batches: 4 (originally 2)  Memory Usage:
5522kB"
"                Buffers: shared hit=56969 read=84533 written=10, temp
written=7811"
"                ->  Bitmap Heap Scan on public.message m2 
(cost=577.97..40212.28 rows=30640 width=16) (actual time=2.406..20.492
rows=20102 loops=115)"
"                      Output: m2.box_id, m2.id"
"                      Recheck Cond: (m2.box_id = $0)"
"                      Buffers: shared hit=56969 read=84533 written=10"
"                      ->  Bitmap Index Scan on "message_box_Idx" 
(cost=0.00..570.31 rows=30640 width=0) (actual time=2.170..2.170 rows=20249
loops=115)"
"                            Index Cond: (m2.box_id = $0)"
"                            Buffers: shared hit=6708"
"Total runtime: 179334.310 ms"


So from my limited experience, the only significant difference I see is that
the Hash Anti Join takes a lot more time in plan 2, but I do not understand
why.
Ideas somebody?

Thanks
panam


--
View this message in context: http://postgresql.1045698.n5.nabble.com/Hash-Anti-Join-performance-degradation-tp4420974p4420974.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux