Re: Query results caching?

"Dann Corbit" <DCorbit@xxxxxxxxx> · Mon, 22 Aug 2005 12:26:58 -0700

> -----Original Message-----
> From: Ben-Nes Yonatan [mailto:da@xxxxxxxxxxxx]
> Sent: Monday, August 22, 2005 1:14 PM
> To: Sean Davis; Dann Corbit
> Cc: pgsql-general@xxxxxxxxxxxxxx
> Subject: Re:  Query results caching?
> 
> Sean Davis wrote:
> > On 8/22/05 1:59 PM, "Dann Corbit" <DCorbit@xxxxxxxxx> wrote:
> >
> >
> >>
> >>>-----Original Message-----
> >>>From: pgsql-general-owner@xxxxxxxxxxxxxx [mailto:pgsql-general-
> >>>owner@xxxxxxxxxxxxxx] On Behalf Of Ben-Nes Yonatan
> >>>Sent: Monday, August 22, 2005 9:03 AM
> >>>To: pgsql-general@xxxxxxxxxxxxxx
> >>>Subject:  Query results caching?
> >>>
> >>>Hi all,
> >>>
> >>>I dont know how its called but I noticed that when I query the db
for
> >>>the first time it give me the result slower then the next times ill
> >>>repeat the same exact query, I figure that its some kind of caching
so
> >>>henceforth the title of the mail :)
> >>
> >>The operating system and the database will both percolate frequently
> >>used information from disk into memory.  Particularly if they are
SELECT
> >>queries, they will get faster and faster.
> >>
> >>
> >>>Anyway I would want to be able to delete that "caching" after every
> >>>query test that I run, cause I want to see the real time results
for
> >>
> >>my
> >>
> >>>queries (its for a searching option for users so it will vary
alot).
> >>
> >>Those are the real times for your queries.
> >>
> >>
> >>>Is it possible to do it manually each time or maybe only from the
> >>>configuration?
> >>
> >>You will have to query a different table each time.
> >
> >
> > Just to extend this notion a bit, if you want to test your
application
> > speed, you may want to generate "real-world" input to determine the
> actual
> > behavior/speed under real conditions.  As Dann pointed out, the
results
> for
> > timings are "real" in that if the user generated the queries as you
did,
> the
> > timing results would be (nearly) the same as for you.  It seems that
> your
> > concern is that the user will not generate the same type of input
that
> you
> > did (that it will vary more), so the best solution may be to
actually
> > generate some test queries that actually conform to what you think
the
> user
> > input will look like.
> >
> > Sean
> >
> I think that I was misunderstood, Ill make an example:
> Lets say that im making the following query for the first time on the
> "motorcycles" table which got an index on the "manufacturer" field:
> 
> EXPLAIN ANALYZE SELECT manufacturer FROM motorcycles WHERE
> manufacturer='suzuki';
> ... Total runtime: 3139.587 ms
> 
> Now im doing the same query again and i get a much faster result
(cause
> of the "caching"): Total runtime: 332.53 ms
> 
> After both of those queries I drop the index and query the table again
> with the exact same query as before and now I receive: Total runtime:
> 216834.871 ms
> 
> And for my last check I run the exact same query again (without
creating
> the INDEX back again) and I get quite similar result to my third
query:
> Total runtime: 209218.01 ms

These results are all what I would expect.  When you delete the index,
the query will be forced to do a table scan (to examine every single
record in the table one by one).  If the table is non-trivial it is
unlikely that either the OS or the database will cache the whole thing
in memory.  However, when you query a small record set, then it is
likely to be retained in RAM which is literally thousands of times
faster than disk.

> My problem is that (maybe I just dont understand something basic
> here...) the last 2 (also the second query but I dont care about that)
> queries were using the "cache" that was created after the first query
> (which had an INDEX) so none of them actually showed me what will
happen
> if a client will do such a search (without an INDEX) for the first
time.

If a search is to be made on a frequent basis, you should create an
index.
The query results above show you why.

> I want to delete that "caching" after I do the first 2 queries so my
> next queries will show me "real life results".

Think about this for a minute.  The real life results you want are very
fast results.  For that reason, you should try to model the customer
queries as nearly as possible.  If you have a canned application like
order entry, then the real parameterized query set will probably be
quite small in real life.  If you are creating a server for ad-hoc
queries then it will be far more difficult to model in real life.

What is the real purpose of the application that you are writing?

Will users be using a pre-programmed front end, or will they be typing
in queries free-form for whatever their heart desires?

> Thanks alot again,
> Yonatan

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq