It looks to me like the work_mem did have an effect.
Your earlier queries had a sort followed by group aggregate at the top, and now its a hash-aggregate. So the query plan DID change. That is likely where the first 10x performance gain came from.
The top of the plan was:
GroupAggregate (cost=11745105.66..12277396.
81 rows=28704 width=12)"
-> Sort (cost=11745105.66..11878034.93 rows=53171707 width=12)"
-> Merge Join (cost=149241.25..1287278.89 rows=53171707 width=12)"
Merge Cond: (b.domain = a.domain)"
and now it is:
"HashAggregate (cost=1685527.69..1686101.77 rows=28704 width=12)"
-> Merge Join (cost=148702.25..1286739.89 rows=53171707 width=12)"
Merge Cond: (b.domain = a.domain)"
The HashAggregate replaced the Sort followed by GroupAggregate at about 1/10 the cost.
It probably only took the first couple hundred MB of work_mem to do this, or less given that you were at the default originally.
Note how the estimated cost on the latter is 1.6 million, and it is 11 million in the first one.
You won't get a large table aggregate significantly faster than this -- you're asking it to scan through 53 million records and aggregate. An explain analyze will be somewhat instructive to help identify if there is more I/O or CPU bound overall as we can compare the estimated cost with the actual times, but this probably won't get very far.
After that, inserting 16M rows requires rather different tuning and bottleneck identification.
Your earlier queries had a sort followed by group aggregate at the top, and now its a hash-aggregate. So the query plan DID change. That is likely where the first 10x performance gain came from.
The top of the plan was:
GroupAggregate (cost=11745105.66..12277396.
81 rows=28704 width=12)"
-> Sort (cost=11745105.66..11878034.93 rows=53171707 width=12)"
-> Merge Join (cost=149241.25..1287278.89 rows=53171707 width=12)"
Merge Cond: (b.domain = a.domain)"
and now it is:
"HashAggregate (cost=1685527.69..1686101.77 rows=28704 width=12)"
-> Merge Join (cost=148702.25..1286739.89 rows=53171707 width=12)"
Merge Cond: (b.domain = a.domain)"
The HashAggregate replaced the Sort followed by GroupAggregate at about 1/10 the cost.
Note how the estimated cost on the latter is 1.6 million, and it is 11 million in the first one.
You won't get a large table aggregate significantly faster than this -- you're asking it to scan through 53 million records and aggregate. An explain analyze will be somewhat instructive to help identify if there is more I/O or CPU bound overall as we can compare the estimated cost with the actual times, but this probably won't get very far.
After that, inserting 16M rows requires rather different tuning and bottleneck identification.
On Thu, Aug 21, 2008 at 12:03 AM, Moritz Onken <onken@xxxxxxxxxxxxxxxx> wrote:
Am 20.08.2008 um 20:06 schrieb Scott Carey:"HashAggregate (cost=1685527.69..1686101.77 rows=28704 width=12)"
Ok, so the problem boils down to the sort at the end.
The query up through the merge join on domain is as fast as its going to get. The sort at the end however, should not happen ideally. There are not that many rows returned, and it should hash_aggregate if it thinks there is enough space to do so.
The query planner is going to choose the sort > agg over the hash-agg if it estimates the total number of resulting rows to be large enough so that the hash won't fit in work_mem. However, there seems to be another factor here based on this:
GroupAggregate (cost=11745105.66..12277396.
81 rows=28704 width=12)"
" -> Sort (cost=11745105.66..11878034.93 rows=53171707 width=12)"
" Sort Key: a."user", b.category"
" -> Merge Join (cost=149241.25..1287278.89 rows=53171707 width=12)"
" Merge Cond: (b.domain = a.domain)"
The planner actually thinks there will only be 28704 rows returned of width 12. But it chooses to sort 53 million rows before aggregating. Thats either a bug or there's something else wrong here. That is the wrong way to aggregate those results no matter how much work_mem you have unless I'm completely missing something...
You can try rearranging the query just to see if you can work around this. What happens if you compare the explain on:
select
a."user", b.category, sum(1.0/b.cat_count)::float
from result a, domain_categories b
where a."domain" = b."domain"
and b.depth < 4
and a.results > 100
and a."user" < 30000
group by a."user", b.category
" -> Merge Join (cost=148702.25..1286739.89 rows=53171707 width=12)"" -> Sort (cost=148415.16..148513.60 rows=39376 width=8)"
" Merge Cond: (b.domain = a.domain)"
" -> Index Scan using domain_categories_domain on domain_categories b (cost=0.00..421716.32 rows=5112568 width=12)"
" Filter: (depth < 4)"
" Sort Key: a.domain"
" -> Bitmap Heap Scan on result a (cost=1249.93..145409.79 rows=39376 width=8)"
" Recheck Cond: ("user" < 30000)"
" Filter: (results > 100)"
" -> Bitmap Index Scan on result_user_idx (cost=0.00..1240.08 rows=66881 width=0)"
" Index Cond: ("user" < 30000)"
"HashAggregate (cost=1685527.69..1686101.77 rows=28704 width=12)"to
select
c."user", c.category, sum(1.0/c.cat_count)::float
from (select a."user", b.category, b.cat_count
from result a, domain_categories b
where a."domain" = b."domain"
and b.depth < 4
and a.results > 100
and a."user" < 30000 ) c
group by c."user", c.category
" -> Merge Join (cost=148702.25..1286739.89 rows=53171707 width=12)"" -> Sort (cost=148415.16..148513.60 rows=39376 width=8)"
" Merge Cond: (b.domain = a.domain)"
" -> Index Scan using domain_categories_domain on domain_categories b (cost=0.00..421716.32 rows=5112568 width=12)"
" Filter: (depth < 4)"
" Sort Key: a.domain"
" -> Bitmap Heap Scan on result a (cost=1249.93..145409.79 rows=39376 width=8)"
" Recheck Cond: ("user" < 30000)"
" Filter: (results > 100)"
" -> Bitmap Index Scan on result_user_idx (cost=0.00..1240.08 rows=66881 width=0)"
" Index Cond: ("user" < 30000)"
It's exactly the same. work_mem was set to 3000MB.It shouldn't make a difference, but I've seen things like this help before so its worth a try. Make sure work_mem is reasonably sized for this test.
I'll try this.
Another thing that won't be that fast, but may avoid the sort, is to select the subselection above into a temporary table, analyze it, and then do the outer select. Make sure your settings for temporary space (temp_buffers in 8.3) are large enough for the intermediate results (700MB should do it). That won't be that fast, but it will most likely be faster than sorting 50 million + rows. There are lots of problems with this approach but it may be worth the experiment.
Thanks so far!
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance