Re: same query in high number of times

Scott Carey <scott@xxxxxxxxxxxxxxxxx> · Mon, 22 Jun 2009 16:06:48 -0700

On 6/22/09 2:11 PM, "Peter Alban" <peter.alban2@xxxxxxxxx> wrote:

> hey folks ! 
> 
> eventually the removing of the group by did improve but still my concern is
> why cant we take the result from memory given its same resultset .
> But I keep pusing for the developers to move to memcached so we overcome this
> limitation . 
> 
> cheers,
> Peter
> 
Caching of that sort is better suited to client code or a middle tier
caching technology.  It isn¹t that simple to resolve that the same query
will return the same result for most queries.  Between two executions
another could have made a change.

But more importantly, the client is what knows what level of Ostaleness¹ is
appropriate for the data.  A RDBMS will operate on a no tolerance policy for
returning stale data and with strict transactional visibility rules.  If
your application only needs a result that is fresh within some window of
time, it should do the caching (or another component).  This is far more
efficient.  A RDBMS is a very poorly performing data cache < its built to
quickly resolve arbitrary relational queries with strict transactional
guarantees, not to cache a set of answers.  Although given a hammer
everything looks like a nail . . .

> On Mon, Jun 22, 2009 at 5:23 AM, Robert Haas <robertmhaas@xxxxxxxxx> wrote:
>> On Sun, Jun 21, 2009 at 4:59 PM, Peter Alban<peter.alban2@xxxxxxxxx> wrote:
>>> 
>>> 
>>> On Sun, Jun 21, 2009 at 10:01 PM, Justin Graf <justin@xxxxxxxxxxxxxxx>
>>> wrote:
>>>> 
>>>> Peter Alban wrote:
>>>> 
>>>> duration: 2533.734 ms  statement:
>>>> 
>>>> SELECT news.url_text,news.title, comments.name <http://comments.name> ,
>>>> comments.createdate,
>>>> comments.user_id, comments.comment FROM news, comments WHERE
>>>> comments.cid=news.id <http://news.id>   AND comments.published='1' GROUP BY
>>>> news.url_text,news.title comments.name <http://comments.name> ,
>>>> comments.createdate,
>>>> comments.user_id, comments.comment ORDER BY comments.createdate DESC LIMIT
>>>> 3
>>>> 
>>>> 
>>>> And here is the query plan :
>>>>                                                               QUERY
>>>> PLAN
>>>> 
>>>> ---------------------------------------------------------------------------
>>>> -------------------------------------------------------------
>>>>  Limit  (cost=4313.54..4313.55 rows=3 width=595) (actual
>>>> time=288.525..288.528 rows=3 loops=1)
>>>>    ->  Sort  (cost=4313.54..4347.26 rows=13486 width=595) (actual
>>>> time=288.523..288.523 rows=3 loops=1)
>>>>          Sort Key: comments.createdate
>>>>          ->  HashAggregate  (cost=3253.60..3388.46 rows=13486 width=595)
>>>> (actual time=137.521..148.132 rows=13415 loops=1)
>>>>                ->  Hash Join  (cost=1400.73..3051.31 rows=13486 width=595)
>>>> (actual time=14.298..51.049 rows=13578 loops=1)
>>>>                      Hash Cond: ("outer".cid = "inner".id)
>>>>                      ->  Seq Scan on comments  (cost=0.00..1178.72
>>>> rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1)
>>>>                            Filter: (published = 1)
>>>>                      ->  Hash  (cost=1391.18..1391.18 rows=3818 width=81)
>>>> (actual time=14.268..14.268 rows=3818 loops=1)
>>>>                            ->  Seq Scan on news  (cost=0.00..1391.18
>>>> rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1)
>>>> 
>>>> The same is being requested from different sessions . So why is it not
>>>> being cached .
>>>> 
>>>> 
>>>> Because the query results are not cached only the RAW tables are.   The
>>>> query is rerun every time it is requested.
>>>> 
>>>> What is the group by clause accomplishing???
>>>> The sorting and hash Aggregate is eating up all the time
>>> 
>>> So this should mean that having say a 5 mb table in memory doing such query
>>> above takes 2 secs in memory ?
>> 
>> Nope.  But as others have pointed out, you need to figure out why it's
>> taking 2.5 s but EXPLAIN ANALYZE is only saying 300 ms.
>> 
>> There's other things you can do to optimize this query; for example:
>> 
>> 1. Try creating an index on comments (createdate), and don't forget to
>> ANALYZE the table afterward, or
>> 
>> 2. Modify the query to remove the probably-unnecessary GROUP BY.
>> 
>> But figuring out the times may be the first thing.  My guess is that
>> the 2.5 s time is a time from your logs, maybe at a time when the
>> system was busy, and the 300 ms time was what you got it when you ran
>> it some other time.  But maybe there's some other explanation.  You
>> should try to figure it out.
>> 
>> ...Robert
> 
> 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance