Re: postgres optimization (effective_cache_size)

Scott Marlowe <smarlowe@xxxxxxxxxxxxxxxxx> · Mon, 22 Aug 2005 14:43:21 -0500

On Fri, 2005-08-12 at 17:48, ogjunk-pgjedan@xxxxxxxxx wrote:
> Hello,
> 
> I followed this advice for picking a good effective_cache_size value
> (below) from Scott Marlowe, and run into a bit of trouble:
> 
> I looked at the `top' output and saw "721380k cached".
> So I calculated the effective cache size using Scott's formula:
> 
>   721380/8 = 90172
> 
> Then I changed my effective_cache size from the previous 10000 to
> 90172:
> 
>   effective_cache_size = 90172   # typically 8KB each
> 
> 
> I restarted PG, and thought I'd see some performance improvements, but
> I actually saw degradation in performance.  All of a sudden a query
> that took a second started taking a few seconds, and consumed more CPU
> than before.

There are a lot of reasons why this may be so, but the most likely is
that when restarting postgresql, the kernel cache got old and was dumped
out, so that it wasn't holding all the data.

Basically, a larger effective cache size favors indexes over seq scan,
since the data are more likely to be in memory when you ask for them.

> Can anyone explain this drop in performance and increase in CPU usage
> and tell me what I did wrong?
> 
> I also noticed that the "721380k cached" number in top dropped to about
> 300000k (about a half).  Maybe that was simply due to PG restart?  If
> so, does that indicate the kernel had about 400,000K worth of PG data
> cached?

That's pretty normal, over time, with postgresql being idle.  Don't
worry about how the query runs the first time you run it so much as how
it runs consistently over time.  If the numbers come back to where they
were, then you've got the right setting.  Like I said, the
effective_cache_size is kind of a lead hammer, used to nudge the planner
one way or another but doesn't need to be exact to be useful.

Take a look at the plan being chosen, it is likely to be a random access
(i.e. use an index) right now rather than a seq scan.  The seq scan
would win on a machine with nothing in the kernel cache, but the index
scan will usually win if everything is already cached.  Again, these are
broad strokes of the brush.  It all depends on the actual % of a table
returned and a few other things...