Re: same query in high number of times

Peter Alban <peter.alban2@xxxxxxxxx> · Mon, 22 Jun 2009 23:11:06 +0200

hey folks ! 

eventually the removing of the group by did improve but still my concern is why cant we take the result from memory given its same resultset . 
But I keep pusing for the developers to move to memcached so we overcome this limitation . 

cheers,
Peter

On Mon, Jun 22, 2009 at 5:23 AM, Robert Haas <robertmhaas@xxxxxxxxx> wrote:

On Sun, Jun 21, 2009 at 4:59 PM, Peter Alban<peter.alban2@xxxxxxxxx> wrote:

>

>

> On Sun, Jun 21, 2009 at 10:01 PM, Justin Graf <justin@xxxxxxxxxxxxxxx>

> wrote:

>>

>> Peter Alban wrote:

>>

>> duration: 2533.734 ms  statement:

>>

>> SELECT news.url_text,news.title, comments.name, comments.createdate,

>> comments.user_id, comments.comment FROM news, comments WHERE

>> comments.cid=news.id  AND comments.published='1' GROUP BY

>> news.url_text,news.title comments.name, comments.createdate,

>> comments.user_id, comments.comment ORDER BY comments.createdate DESC LIMIT 3

>>

>>

>> And here is the query plan :

>>                                                               QUERY

>> PLAN

>>

>> ----------------------------------------------------------------------------------------------------------------------------------------

>>  Limit  (cost=4313.54..4313.55 rows=3 width=595) (actual

>> time=288.525..288.528 rows=3 loops=1)

>>    ->  Sort  (cost=4313.54..4347.26 rows=13486 width=595) (actual

>> time=288.523..288.523 rows=3 loops=1)

>>          Sort Key: comments.createdate

>>          ->  HashAggregate  (cost=3253.60..3388.46 rows=13486 width=595)

>> (actual time=137.521..148.132 rows=13415 loops=1)

>>                ->  Hash Join  (cost=1400.73..3051.31 rows=13486 width=595)

>> (actual time=14.298..51.049 rows=13578 loops=1)

>>                      Hash Cond: ("outer".cid = "inner".id)

>>                      ->  Seq Scan on comments  (cost=0.00..1178.72

>> rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1)

>>                            Filter: (published = 1)

>>                      ->  Hash  (cost=1391.18..1391.18 rows=3818 width=81)

>> (actual time=14.268..14.268 rows=3818 loops=1)

>>                            ->  Seq Scan on news  (cost=0.00..1391.18

>> rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1)

>>

>> The same is being requested from different sessions . So why is it not

>> being cached .

>>

>>

>> Because the query results are not cached only the RAW tables are.   The

>> query is rerun every time it is requested.

>>

>> What is the group by clause accomplishing???

>> The sorting and hash Aggregate is eating up all the time

>

> So this should mean that having say a 5 mb table in memory doing such query

> above takes 2 secs in memory ?

Nope.  But as others have pointed out, you need to figure out why it's

taking 2.5 s but EXPLAIN ANALYZE is only saying 300 ms.

There's other things you can do to optimize this query; for example:

1. Try creating an index on comments (createdate), and don't forget to

ANALYZE the table afterward, or

2. Modify the query to remove the probably-unnecessary GROUP BY.

But figuring out the times may be the first thing.  My guess is that

the 2.5 s time is a time from your logs, maybe at a time when the

system was busy, and the 300 ms time was what you got it when you ran

it some other time.  But maybe there's some other explanation.  You

should try to figure it out.

...Robert