performance change from 8.3.1 to later releases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I have access to servers running 8.3.1, 8.3.8, 8.4.2 and 8.4.3. I have noticed that on the 8.4.* versions, a lot of our code is either taking much longer to complete, or never completing. I think I have isolated the problem to queries using in(), not in() or not exists(). I've put together a test case with one particular query that demonstrates the problem.

select count(*) from traderhank.vendor_catalog = 147,352

select count(*) from traderhank.xc_products = 8,610

The sub query (select vc1.th_sku from traderhank.vendor_catalog vc1
join traderhank.vendor_catalog vc2 on vc2.short_desc_75 = vc1.short_desc_75
and vc2.th_sku != vc1.th_sku
where vc1.cutoff_date is null and vc2.cutoff_date is null
group by vc1.th_sku
)  yields 54,390 rows

The sub query (select vc_th_Sku from traderhank.xc_products where vc_th_sku is not null) yields 5,132 rows

These 2 tables have been loaded from a pg_dump on all servers, vacuum analyze run after load.

1st case: pg 8.3.1 using left join finishes the query in about 3.5 seconds

explain analyze
select vc.* from traderhank.vendor_catalog vc
left join
(
select vc1.th_sku from traderhank.vendor_catalog vc1
join traderhank.vendor_catalog vc2 on vc2.short_desc_75 = vc1.short_desc_75
and vc2.th_sku != vc1.th_sku
where vc1.cutoff_date is null and vc2.cutoff_date is null
group by vc1.th_sku
) vcj on vcj.th_sku = vc.th_sku
left join traderhank.xc_products xc on xc.vc_th_sku = vc.th_sku
where vcj.th_sku is null
and xc.vc_th_sku is null

"Merge Left Join (cost=71001.53..72899.35 rows=36838 width=310) (actual time=9190.446..10703.509 rows=78426 loops=1)"
"  Merge Cond: ((vc.th_sku)::text = (vc1.th_sku)::text)"
"  Filter: (vc1.th_sku IS NULL)"
" -> Merge Left Join (cost=19362.72..20201.46 rows=73676 width=310) (actual time=917.947..1784.593 rows=141962 loops=1)"
"        Merge Cond: ((vc.th_sku)::text = (xc.vc_th_sku)::text)"
"        Filter: (xc.vc_th_sku IS NULL)"
" -> Sort (cost=17630.88..17999.26 rows=147352 width=310) (actual time=871.130..1114.453 rows=147352 loops=1)"
"              Sort Key: vc.th_sku"
"              Sort Method:  quicksort  Memory: 45285kB"
" -> Seq Scan on vendor_catalog vc (cost=0.00..4981.52 rows=147352 width=310) (actual time=0.020..254.023 rows=147352 loops=1)" " -> Sort (cost=1731.84..1753.37 rows=8610 width=8) (actual time=46.783..62.347 rows=9689 loops=1)"
"              Sort Key: xc.vc_th_sku"
"              Sort Method:  quicksort  Memory: 734kB"
" -> Seq Scan on xc_products xc (cost=0.00..1169.10 rows=8610 width=8) (actual time=0.013..25.490 rows=8610 loops=1)" " -> Sort (cost=51638.80..51814.57 rows=70309 width=32) (actual time=8272.483..8382.258 rows=66097 loops=1)"
"        Sort Key: vc1.th_sku"
"        Sort Method:  quicksort  Memory: 4086kB"
" -> HashAggregate (cost=44572.25..45275.34 rows=70309 width=8) (actual time=7978.928..8080.317 rows=54390 loops=1)" " -> Merge Join (cost=27417.09..42493.30 rows=831580 width=8) (actual time=1317.874..6380.928 rows=810012 loops=1)" " Merge Cond: ((vc1.short_desc_75)::text = (vc2.short_desc_75)::text)" " Join Filter: ((vc2.th_sku)::text <> (vc1.th_sku)::text)" " -> Sort (cost=13708.55..13970.22 rows=104669 width=27) (actual time=661.319..834.131 rows=104624 loops=1)"
"                          Sort Key: vc1.short_desc_75"
"                          Sort Method:  quicksort  Memory: 11235kB"
" -> Seq Scan on vendor_catalog vc1 (cost=0.00..4981.52 rows=104669 width=27) (actual time=0.010..268.552 rows=104624 loops=1)"
"                                Filter: (cutoff_date IS NULL)"
" -> Sort (cost=13708.55..13970.22 rows=104669 width=27) (actual time=656.447..2130.290 rows=914636 loops=1)"
"                          Sort Key: vc2.short_desc_75"
"                          Sort Method:  quicksort  Memory: 11235kB"
" -> Seq Scan on vendor_catalog vc2 (cost=0.00..4981.52 rows=104669 width=27) (actual time=0.015..266.926 rows=104624 loops=1)"
"                                Filter: (cutoff_date IS NULL)"
"Total runtime: 10837.005 ms"


This query returns same set of rows, in about 2.8 seconds:

explain analyze
select vc.* from traderhank.vendor_catalog vc
where vc.th_sku not in
(
select vc1.th_sku from traderhank.vendor_catalog vc1
join traderhank.vendor_catalog vc2 on vc2.short_desc_75 = vc1.short_desc_75
and vc2.th_sku != vc1.th_sku
where vc1.cutoff_date is null and vc2.cutoff_date is null and vc1.th_sku is not null
group by vc1.th_sku
)
and vc.th_sku not in
(select vc_th_Sku from traderhank.xc_products where vc_th_sku is not null)


"Seq Scan on vendor_catalog vc (cost=46633.03..52351.31 rows=36838 width=310) (actual time=8216.197..8506.825 rows=78426 loops=1)"
"  Filter: ((NOT (hashed subplan)) AND (NOT (hashed subplan)))"
"  SubPlan"
" -> Seq Scan on xc_products (cost=0.00..1169.10 rows=5129 width=8) (actual time=0.026..16.907 rows=5132 loops=1)"
"          Filter: (vc_th_sku IS NOT NULL)"
" -> HashAggregate (cost=44572.25..45275.34 rows=70309 width=8) (actual time=7973.792..8076.297 rows=54390 loops=1)" " -> Merge Join (cost=27417.09..42493.30 rows=831580 width=8) (actual time=1325.988..6377.197 rows=810012 loops=1)" " Merge Cond: ((vc1.short_desc_75)::text = (vc2.short_desc_75)::text)"
"                Join Filter: ((vc2.th_sku)::text <> (vc1.th_sku)::text)"
" -> Sort (cost=13708.55..13970.22 rows=104669 width=27) (actual time=669.237..841.978 rows=104624 loops=1)"
"                      Sort Key: vc1.short_desc_75"
"                      Sort Method:  quicksort  Memory: 11235kB"
" -> Seq Scan on vendor_catalog vc1 (cost=0.00..4981.52 rows=104669 width=27) (actual time=0.014..272.037 rows=104624 loops=1)" " Filter: ((cutoff_date IS NULL) AND (th_sku IS NOT NULL))" " -> Sort (cost=13708.55..13970.22 rows=104669 width=27) (actual time=656.638..2130.440 rows=914636 loops=1)"
"                      Sort Key: vc2.short_desc_75"
"                      Sort Method:  quicksort  Memory: 11235kB"
" -> Seq Scan on vendor_catalog vc2 (cost=0.00..4981.52 rows=104669 width=27) (actual time=0.016..266.767 rows=104624 loops=1)"
"                            Filter: (cutoff_date IS NULL)"
"Total runtime: 8631.652 ms"


So far, so good.

Same 2 queries on 8.4.2:

Left join version will return same rows in about 42 seconds

explain analyze
select vc.* from traderhank.vendor_catalog vc
left join
(
select vc1.th_sku from traderhank.vendor_catalog vc1
join traderhank.vendor_catalog vc2 on vc2.short_desc_75 = vc1.short_desc_75
and vc2.th_sku != vc1.th_sku
where vc1.cutoff_date is null and vc2.cutoff_date is null
group by vc1.th_sku
) vcj on vcj.th_sku = vc.th_sku
left join traderhank.xc_products xc on xc.vc_th_sku = vc.th_sku
where vcj.th_sku is null
and xc.vc_th_sku is null

"Hash Anti Join (cost=142357.84..167341.98 rows=140877 width=309) (actual time=42455.615..44244.251 rows=78426 loops=1)"
"  Hash Cond: ((vc.th_sku)::text = (vc1.th_sku)::text)"
" -> Hash Anti Join (cost=1829.48..12853.64 rows=141143 width=309) (actual time=62.380..1049.863 rows=141962 loops=1)"
"        Hash Cond: ((vc.th_sku)::text = (xc.vc_th_sku)::text)"
" -> Seq Scan on vendor_catalog vc (cost=0.00..8534.52 rows=147352 width=309) (actual time=0.009..351.005 rows=147352 loops=1)" " -> Hash (cost=1716.99..1716.99 rows=8999 width=8) (actual time=62.348..62.348 rows=5132 loops=1)" " -> Seq Scan on xc_products xc (cost=0.00..1716.99 rows=8999 width=8) (actual time=0.009..45.818 rows=8610 loops=1)" " -> Hash (cost=139067.10..139067.10 rows=75541 width=32) (actual time=42393.149..42393.149 rows=54390 loops=1)" " -> Group (cost=134997.43..138311.69 rows=75541 width=8) (actual time=35987.418..42264.948 rows=54390 loops=1)" " -> Sort (cost=134997.43..136654.56 rows=662853 width=8) (actual time=35987.407..40682.275 rows=810012 loops=1)"
"                    Sort Key: vc1.th_sku"
"                    Sort Method:  external merge  Disk: 14256kB"
" -> Merge Join (cost=39600.73..52775.08 rows=662853 width=8) (actual time=5762.785..13763.041 rows=810012 loops=1)" " Merge Cond: ((vc1.short_desc_75)::text = (vc2.short_desc_75)::text)" " Join Filter: ((vc2.th_sku)::text <> (vc1.th_sku)::text)" " -> Sort (cost=19800.37..20062.75 rows=104954 width=27) (actual time=2884.012..3604.405 rows=104624 loops=1)"
"                                Sort Key: vc1.short_desc_75"
"                                Sort Method:  external merge  Disk: 3776kB"
" -> Seq Scan on vendor_catalog vc1 (cost=0.00..8534.52 rows=104954 width=27) (actual time=0.009..395.976 rows=104624 loops=1)"
"                                      Filter: (cutoff_date IS NULL)"
" -> Materialize (cost=19800.37..21112.29 rows=104954 width=27) (actual time=2878.550..5291.205 rows=914636 loops=1)" " -> Sort (cost=19800.37..20062.75 rows=104954 width=27) (actual time=2878.538..3607.201 rows=104624 loops=1)"
"                                      Sort Key: vc2.short_desc_75"
" Sort Method: external merge Disk: 3776kB" " -> Seq Scan on vendor_catalog vc2 (cost=0.00..8534.52 rows=104954 width=27) (actual time=0.018..392.270 rows=104624 loops=1)"
"                                            Filter: (cutoff_date IS NULL)"
"Total runtime: 45145.977 ms"



on any version from 8.3.8 on, this query has never returned, and explain analyze never returns, so I am only posting explain output

explain --analyze
select vc.* from traderhank.vendor_catalog vc
where vc.th_sku not in
(
select vc1.th_sku from traderhank.vendor_catalog vc1
join traderhank.vendor_catalog vc2 on vc2.short_desc_75 = vc1.short_desc_75
and vc2.th_sku != vc1.th_sku
where vc1.cutoff_date is null and vc2.cutoff_date is null and vc1.th_sku is not null
group by vc1.th_sku
)
and vc.th_sku not in
(select vc_th_Sku from traderhank.xc_products where vc_th_sku is not null)


"Seq Scan on vendor_catalog vc (cost=140413.05..91527264.28 rows=36838 width=309)"
"  Filter: ((NOT (hashed SubPlan 2)) AND (NOT (SubPlan 1)))"
"  SubPlan 2"
"    ->  Seq Scan on xc_products  (cost=0.00..1716.99 rows=5132 width=8)"
"          Filter: (vc_th_sku IS NOT NULL)"
"  SubPlan 1"
"    ->  Materialize  (cost=138683.23..139734.64 rows=75541 width=8)"
"          ->  Group  (cost=134997.43..138311.69 rows=75541 width=8)"
"                ->  Sort  (cost=134997.43..136654.56 rows=662853 width=8)"
"                      Sort Key: vc1.th_sku"
" -> Merge Join (cost=39600.73..52775.08 rows=662853 width=8)" " Merge Cond: ((vc1.short_desc_75)::text = (vc2.short_desc_75)::text)" " Join Filter: ((vc2.th_sku)::text <> (vc1.th_sku)::text)" " -> Sort (cost=19800.37..20062.75 rows=104954 width=27)"
"                                  Sort Key: vc1.short_desc_75"
" -> Seq Scan on vendor_catalog vc1 (cost=0.00..8534.52 rows=104954 width=27)" " Filter: ((cutoff_date IS NULL) AND (th_sku IS NOT NULL))" " -> Materialize (cost=19800.37..21112.29 rows=104954 width=27)" " -> Sort (cost=19800.37..20062.75 rows=104954 width=27)"
"                                        Sort Key: vc2.short_desc_75"
" -> Seq Scan on vendor_catalog vc2 (cost=0.00..8534.52 rows=104954 width=27)" " Filter: (cutoff_date IS NULL)"




I've also tried changing the code to not exists, but that query never comes back on any version I have available:

explain --analyze
select vc.* from traderhank.vendor_catalog vc
where not exists
(
select 1 from traderhank.vendor_catalog vc1
join traderhank.vendor_catalog vc2 on vc2.short_desc_75 = vc1.short_desc_75
and vc2.th_sku != vc1.th_sku
where vc1.cutoff_date is null and vc2.cutoff_date is null and vc1.th_sku = vc.th_sku
group by vc1.th_sku
)
and not exists
(select 1 from traderhank.xc_products where vc_th_sku is not null and vc_th_sku = vc.th_sku)

"Nested Loop Anti Join  (cost=63650.74..93617.53 rows=1 width=309)"
"  Join Filter: ((xc_products.vc_th_sku)::text = (vc.th_sku)::text)"
"  ->  Hash Anti Join  (cost=63650.74..91836.39 rows=1 width=309)"
"        Hash Cond: ((vc.th_sku)::text = (vc1.th_sku)::text)"
" -> Seq Scan on vendor_catalog vc (cost=0.00..8534.52 rows=147352 width=309)"
"        ->  Hash  (cost=52775.08..52775.08 rows=662853 width=8)"
" -> Merge Join (cost=39600.73..52775.08 rows=662853 width=8)" " Merge Cond: ((vc1.short_desc_75)::text = (vc2.short_desc_75)::text)" " Join Filter: ((vc2.th_sku)::text <> (vc1.th_sku)::text)" " -> Sort (cost=19800.37..20062.75 rows=104954 width=27)"
"                          Sort Key: vc1.short_desc_75"
" -> Seq Scan on vendor_catalog vc1 (cost=0.00..8534.52 rows=104954 width=27)"
"                                Filter: (cutoff_date IS NULL)"
" -> Materialize (cost=19800.37..21112.29 rows=104954 width=27)" " -> Sort (cost=19800.37..20062.75 rows=104954 width=27)"
"                                Sort Key: vc2.short_desc_75"
" -> Seq Scan on vendor_catalog vc2 (cost=0.00..8534.52 rows=104954 width=27)"
"                                      Filter: (cutoff_date IS NULL)"
"  ->  Seq Scan on xc_products  (cost=0.00..1716.99 rows=5132 width=8)"
"        Filter: (xc_products.vc_th_sku IS NOT NULL)"



So, my question is, do I need to re-write all of my in() and not in () queries to left joins, is this something that might get resolved in another release in the future?

Thanks for any help.

Roger Ging



--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux