Re: shared_buffers/effective_cache_size on 96GB server

Jeff Janes <jeff.janes@xxxxxxxxx> · Thu, 18 Oct 2012 10:54:45 -0700

On Thu, Oct 11, 2012 at 11:17 AM, Josh Berkus <josh@xxxxxxxxxxxx> wrote:
>
>> Does anyone see effective_cache_size make a difference anyway?  If so,
>> in what circumstances?
>
> E_C_S, together with random_page_cost, the table and index sizes, the
> row estimates and the cpu_* costs, form an equation which estimates the
> cost of doing various kinds of scans, particularly index scan vs. table
> scan.

E_C_S only comes into play when the same table pages are (predicted to
be) visited repeatedly during the index scan, but this is the same
situation in which a bitmap scan is generally preferred anyway.  In
fact the two seem to be conceptually very similar (either avoid
actually visiting the block repeatedly, or avoid the IO cost of
visiting the block repeatedly), and I'm not sure why bitmap scans
comes out on top--there doesn't seem to be a CPU cost estimate of
visiting a block which is assumed to already be in memory, nor is
bitmap scan given credit for the use of effective_io_concurrency.

But I found a simple case (over in "Unused index influencing
sequential scan plan") which is very sensitive to E_C_S.  When the
index scan is being done to avoid a costly sort or aggregation, then
it can't be usefully replaced with a bitmap scan since it won't
produce index-order sorted output.

>> In my hands, queries for which effective_cache_size might come into
>> play (for deciding between seq scan and index scan) are instead
>> planned as bitmap scans.
>
> You have a very unusual workload, or a very small database.

I think all real workloads are unusual, otherwise benchmarking would
be easy...but since complex queries are intractable to figure out what
the planner is thinking, I'm biased to using simple ones when trying
to figure out general principles.  I can make the database look as big
or small as I want (relative to RAM), by feeding effective_cache_size
false information.

Anyway, it seems like the consequences of overestimating E_C_S (by
underestimating the number of processes that might expect to benefit
from it concurrently) are worse than the consequences of
underestimating it--assuming you have the types of queries for which
it makes much of a difference.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance