Slow query with a lot of data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I run this query:

select max(a."user"), b.category, count(1) from result a, domain_categories b where a."domain" = b."domain" group by b.category;

the table result contains all websites a user visited. And the table domain_categories contains all categories a domain is in. result has 20 Mio rows and domain_categories has about 12 Mio. There are 500.000 different users.

I have indexes on result.domain, domain_categories.domain, result.user, domain_categories.category. Clustered result on user and domain_categories on domain.

explain analyze says (limited to one user with id 1337):

"HashAggregate (cost=2441577.16..2441614.72 rows=2504 width=8) (actual time=94667.335..94671.508 rows=3361 loops=1)" " -> Merge Join (cost=2119158.02..2334105.00 rows=14329622 width=8) (actual time=63559.938..94621.557 rows=36308 loops=1)"
"        Merge Cond: (a.domain = b.domain)"
" -> Sort (cost=395.52..405.49 rows=3985 width=8) (actual time=0.189..0.211 rows=19 loops=1)"
"              Sort Key: a.domain"
"              Sort Method:  quicksort  Memory: 27kB"
" -> Index Scan using result_user_idx on result a (cost=0.00..157.21 rows=3985 width=8) (actual time=0.027..0.108 rows=61 loops=1)"
"                    Index Cond: ("user" = 1337)"
" -> Materialize (cost=2118752.28..2270064.64 rows=12104989 width=8) (actual time=46460.599..82336.116 rows=12123161 loops=1)" " -> Sort (cost=2118752.28..2149014.75 rows=12104989 width=8) (actual time=46460.592..59595.851 rows=12104989 loops=1)"
"                    Sort Key: b.domain"
"                    Sort Method:  external sort  Disk: 283992kB"
" -> Seq Scan on domain_categories b (cost=0.00..198151.89 rows=12104989 width=8) (actual time=14.352..22572.869 rows=12104989 loops=1)"
"Total runtime: 94817.058 ms"

This is running on a pretty small server with 1gb of ram and a slow sata hd. Shared_buffers is 312mb, max_fsm_pages = 153600. Everything else is commented out. Postgresql v8.3.3. Operating system Ubuntu 8.04.

It would be great if someone could help improve this query. This is for a research project at my university.

Thanks in advance,

Moritz



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux