Re: Slow query with a lot of data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Am 19.08.2008 um 14:17 schrieb Matthew Wakeling:

On Tue, 19 Aug 2008, Moritz Onken wrote:
      tablename        | attname | n_distinct | correlation
result                 | domain  |       1642 |           1

Well, the important thing is the correlation, which is 1, indicating that Postgres knows that the table is clustered. So I have no idea why it is sorting the entire table.

What happens when you run EXPLAIN SELECT * FROM result ORDER BY domain?


"Index Scan using result_domain_idx on result (cost=0.00..748720.72 rows=20306816 width=49)"
... as it should be.

Sounds like an awfully long time to me. Also, I think restricting it to 280 users is probably not making it any faster.

If I hadn't restricted it to 280 users it would have run ~350days...

What makes you say that? Perhaps you could post EXPLAINs of both of the queries.

Matthew

That was just a guess. The query needs to retrieve the data for about 50,000 users. But it should be fast if I don't retrieve the data for specific users but let in run through all rows.

explain insert into setup1 (select
  a."user", b.category, sum(1.0/b.cat_count)::float
  from result a, domain_categories b
  where a."domain" = b."domain"
  and b.depth < 4
  and a.results > 100
  and a."user" < 30000
  group by a."user", b.category);


"GroupAggregate  (cost=11745105.66..12277396.81 rows=28704 width=12)"
"  ->  Sort  (cost=11745105.66..11878034.93 rows=53171707 width=12)"
"        Sort Key: a."user", b.category"
" -> Merge Join (cost=149241.25..1287278.89 rows=53171707 width=12)"
"              Merge Cond: (b.domain = a.domain)"
" -> Index Scan using domain_categories_domain on domain_categories b (cost=0.00..421716.32 rows=5112568 width=12)"
"                    Filter: (depth < 4)"
" -> Materialize (cost=148954.16..149446.36 rows=39376 width=8)" " -> Sort (cost=148954.16..149052.60 rows=39376 width=8)"
"                          Sort Key: a.domain"
" -> Bitmap Heap Scan on result a (cost=1249.93..145409.79 rows=39376 width=8)"
"                                Recheck Cond: ("user" < 30000)"
"                                Filter: (results > 100)"
" -> Bitmap Index Scan on result_user_idx (cost=0.00..1240.08 rows=66881 width=0)"
"                                      Index Cond: ("user" < 30000)"


This query limits the number of users to 215 and this query took about 50 minutes. I could create to temp tables which have only those records which I need for this query. Would this be a good idea?


moritz



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux