2011/5/24 panam <panam@xxxxxxx>: > Hi, > > In my DB, there is a large table containing messages and one that contains > message boxes. > Messages are assigned to boxes via a child parent link m->b. > In order to obtain the last message for a specific box, I use the following > SQL: > > SELECT m1.id FROM message m1 LEFT JOIN message m2 ON (m1.box_id = m2.box_id > AND m1.id < m2.id) WHERE m2.id IS NULL AND m1.box_id = id; > > This worked quite well for a long time. But now, suddenly the performance of > this query drastically degraded as new messages were added. > If these new messages are removed again, everything is back to normal. If > other messages are removed instead, the problem remains, so it does not seem > to be a memory issue. I fear I have difficulties to understand what is going > wrong here. We need more information here. The case is in fact interesting. What's the PostgreSQL version, and did you have log of vacuum and checkpoint activity ? (no vacuum full/cluster or such thing running ?) Obvisouly, Craig suggestion to read http://wiki.postgresql.org/wiki/SlowQueryQuestions is relevant as it helps to have all common information required to analyze the issue. > > This is the query plan when everything is fine: > > "Seq Scan on public.box this_ (cost=0.00..10467236.32 rows=128 width=696) > (actual time=0.169..7683.978 rows=128 loops=1)" > " Output: this_.id, this_.login, (SubPlan 1)" > " Buffers: shared hit=188413 read=94635 written=135, temp read=22530 > written=22374" > " SubPlan 1" > " -> Hash Anti Join (cost=41323.25..81775.25 rows=20427 width=8) > (actual time=59.571..59.877 rows=1 loops=128)" > " Output: m1.id" > " Hash Cond: (m1.box_id = m2.box_id)" > " Join Filter: (m1.id < m2.id)" > " Buffers: shared hit=188412 read=94633 written=135, temp > read=22530 written=22374" > " -> Bitmap Heap Scan on public.message m1 (cost=577.97..40212.28 > rows=30640 width=16) (actual time=3.152..9.514 rows=17982 loops=128)" > " Output: m1.id, m1.box_id" > " Recheck Cond: (m1.box_id = $0)" > " Buffers: shared hit=131993 read=9550 written=23" > " -> Bitmap Index Scan on "message_box_Idx" > (cost=0.00..570.31 rows=30640 width=0) (actual time=2.840..2.840 rows=18193 > loops=128)" > " Index Cond: (m1.box_id = $0)" > " Buffers: shared hit=314 read=6433 written=23" > " -> Hash (cost=40212.28..40212.28 rows=30640 width=16) (actual > time=26.840..26.840 rows=20014 loops=115)" > " Output: m2.box_id, m2.id" > " Buckets: 4096 Batches: 4 (originally 2) Memory Usage: > 5444kB" > " Buffers: shared hit=56419 read=85083 written=112, temp > written=7767" > " -> Bitmap Heap Scan on public.message m2 > (cost=577.97..40212.28 rows=30640 width=16) (actual time=2.419..20.007 > rows=20014 loops=115)" > " Output: m2.box_id, m2.id" > " Recheck Cond: (m2.box_id = $0)" > " Buffers: shared hit=56419 read=85083 written=112" > " -> Bitmap Index Scan on "message_box_Idx" > (cost=0.00..570.31 rows=30640 width=0) (actual time=2.166..2.166 rows=20249 > loops=115)" > " Index Cond: (m2.box_id = $0)" > " Buffers: shared hit=6708" > "Total runtime: 7685.202 ms" > > This is the plan when the query gets sluggish: > > "Seq Scan on public.box this_ (cost=0.00..10467236.32 rows=128 width=696) > (actual time=0.262..179333.086 rows=128 loops=1)" > " Output: this_.id, this_.login, (SubPlan 1)" > " Buffers: shared hit=189065 read=93983 written=10, temp read=22668 > written=22512" > " SubPlan 1" > " -> Hash Anti Join (cost=41323.25..81775.25 rows=20427 width=8) > (actual time=1264.700..1400.886 rows=1 loops=128)" > " Output: m1.id" > " Hash Cond: (m1.box_id = m2.box_id)" > " Join Filter: (m1.id < m2.id)" > " Buffers: shared hit=189064 read=93981 written=10, temp read=22668 > written=22512" > " -> Bitmap Heap Scan on public.message m1 (cost=577.97..40212.28 > rows=30640 width=16) (actual time=3.109..9.850 rows=18060 loops=128)" > " Output: m1.id, m1.box_id" > " Recheck Cond: (m1.box_id = $0)" > " Buffers: shared hit=132095 read=9448" > " -> Bitmap Index Scan on "message_box_Idx" > (cost=0.00..570.31 rows=30640 width=0) (actual time=2.867..2.867 rows=18193 > loops=128)" > " Index Cond: (m1.box_id = $0)" > " Buffers: shared hit=312 read=6435" > " -> Hash (cost=40212.28..40212.28 rows=30640 width=16) (actual > time=27.533..27.533 rows=20102 loops=115)" > " Output: m2.box_id, m2.id" > " Buckets: 4096 Batches: 4 (originally 2) Memory Usage: > 5522kB" > " Buffers: shared hit=56969 read=84533 written=10, temp > written=7811" > " -> Bitmap Heap Scan on public.message m2 > (cost=577.97..40212.28 rows=30640 width=16) (actual time=2.406..20.492 > rows=20102 loops=115)" > " Output: m2.box_id, m2.id" > " Recheck Cond: (m2.box_id = $0)" > " Buffers: shared hit=56969 read=84533 written=10" > " -> Bitmap Index Scan on "message_box_Idx" > (cost=0.00..570.31 rows=30640 width=0) (actual time=2.170..2.170 rows=20249 > loops=115)" > " Index Cond: (m2.box_id = $0)" > " Buffers: shared hit=6708" > "Total runtime: 179334.310 ms" > > > So from my limited experience, the only significant difference I see is that > the Hash Anti Join takes a lot more time in plan 2, but I do not understand > why. > Ideas somebody? > > Thanks > panam > > > -- > View this message in context: http://postgresql.1045698.n5.nabble.com/Hash-Anti-Join-performance-degradation-tp4420974p4420974.html > Sent from the PostgreSQL - performance mailing list archive at Nabble.com. > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ ; PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance