On 6/22/09 2:11 PM, "Peter Alban" <peter.alban2@xxxxxxxxx> wrote: > hey folks ! > > eventually the removing of the group by did improve but still my concern is > why cant we take the result from memory given its same resultset . > But I keep pusing for the developers to move to memcached so we overcome this > limitation . > > cheers, > Peter > Caching of that sort is better suited to client code or a middle tier caching technology. It isn¹t that simple to resolve that the same query will return the same result for most queries. Between two executions another could have made a change. But more importantly, the client is what knows what level of Ostaleness¹ is appropriate for the data. A RDBMS will operate on a no tolerance policy for returning stale data and with strict transactional visibility rules. If your application only needs a result that is fresh within some window of time, it should do the caching (or another component). This is far more efficient. A RDBMS is a very poorly performing data cache < its built to quickly resolve arbitrary relational queries with strict transactional guarantees, not to cache a set of answers. Although given a hammer everything looks like a nail . . . > On Mon, Jun 22, 2009 at 5:23 AM, Robert Haas <robertmhaas@xxxxxxxxx> wrote: >> On Sun, Jun 21, 2009 at 4:59 PM, Peter Alban<peter.alban2@xxxxxxxxx> wrote: >>> >>> >>> On Sun, Jun 21, 2009 at 10:01 PM, Justin Graf <justin@xxxxxxxxxxxxxxx> >>> wrote: >>>> >>>> Peter Alban wrote: >>>> >>>> duration: 2533.734 ms statement: >>>> >>>> SELECT news.url_text,news.title, comments.name <http://comments.name> , >>>> comments.createdate, >>>> comments.user_id, comments.comment FROM news, comments WHERE >>>> comments.cid=news.id <http://news.id> AND comments.published='1' GROUP BY >>>> news.url_text,news.title comments.name <http://comments.name> , >>>> comments.createdate, >>>> comments.user_id, comments.comment ORDER BY comments.createdate DESC LIMIT >>>> 3 >>>> >>>> >>>> And here is the query plan : >>>> QUERY >>>> PLAN >>>> >>>> --------------------------------------------------------------------------- >>>> ------------------------------------------------------------- >>>> Limit (cost=4313.54..4313.55 rows=3 width=595) (actual >>>> time=288.525..288.528 rows=3 loops=1) >>>> -> Sort (cost=4313.54..4347.26 rows=13486 width=595) (actual >>>> time=288.523..288.523 rows=3 loops=1) >>>> Sort Key: comments.createdate >>>> -> HashAggregate (cost=3253.60..3388.46 rows=13486 width=595) >>>> (actual time=137.521..148.132 rows=13415 loops=1) >>>> -> Hash Join (cost=1400.73..3051.31 rows=13486 width=595) >>>> (actual time=14.298..51.049 rows=13578 loops=1) >>>> Hash Cond: ("outer".cid = "inner".id) >>>> -> Seq Scan on comments (cost=0.00..1178.72 >>>> rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1) >>>> Filter: (published = 1) >>>> -> Hash (cost=1391.18..1391.18 rows=3818 width=81) >>>> (actual time=14.268..14.268 rows=3818 loops=1) >>>> -> Seq Scan on news (cost=0.00..1391.18 >>>> rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1) >>>> >>>> The same is being requested from different sessions . So why is it not >>>> being cached . >>>> >>>> >>>> Because the query results are not cached only the RAW tables are. The >>>> query is rerun every time it is requested. >>>> >>>> What is the group by clause accomplishing??? >>>> The sorting and hash Aggregate is eating up all the time >>> >>> So this should mean that having say a 5 mb table in memory doing such query >>> above takes 2 secs in memory ? >> >> Nope. But as others have pointed out, you need to figure out why it's >> taking 2.5 s but EXPLAIN ANALYZE is only saying 300 ms. >> >> There's other things you can do to optimize this query; for example: >> >> 1. Try creating an index on comments (createdate), and don't forget to >> ANALYZE the table afterward, or >> >> 2. Modify the query to remove the probably-unnecessary GROUP BY. >> >> But figuring out the times may be the first thing. My guess is that >> the 2.5 s time is a time from your logs, maybe at a time when the >> system was busy, and the 300 ms time was what you got it when you ran >> it some other time. But maybe there's some other explanation. You >> should try to figure it out. >> >> ...Robert > > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance