Re: anti-join chosen even when slower than old plan

Cédric Villemain <cedric.villemain.debian@xxxxxxxxx> · Fri, 12 Nov 2010 10:07:31 +0100

2010/11/11 Tom Lane <tgl@xxxxxxxxxxxxx>:
> Robert Haas <robertmhaas@xxxxxxxxx> writes:
>> Yeah.  For Kevin's case, it seems like we want the caching percentage
>> to vary not so much based on which table we're hitting at the moment
>> but on how much of it we're actually reading.
>
> Well, we could certainly take the expected number of pages to read and
> compare that to effective_cache_size.  The thing that's missing in that
> equation is how much other stuff is competing for cache space.  I've
> tried to avoid having the planner need to know the total size of the
> database cluster, but it's kind of hard to avoid that if you want to
> model this honestly.
>
> Would it be at all workable to have an estimate that so many megs of a
> table are in cache

Yes, with Linux ... at least.

> (independently of any other table), and then we could
> scale the cost based on the expected number of pages to read versus that
> number?  The trick here is that DBAs really aren't going to want to set
> such a per-table number (at least, most of the time) so we need a
> formula to get to a default estimate for that number based on some simple
> system-wide parameters.  I'm not sure if that's any easier.

My current ideas for future POC with pgfincore are around what is said
currently in this thread.

I'd like to have some maintenance stuff like auto-ANALYZE which report
table and index usage of the OS cache, it might be % of data in cache
and distribution of data in cache (perhaps only my last 15% of the
table are in cache, or perhaps 15% of blocks with a more
regular-random?- distribution)
My current stats around OS cache illustrate that the OS page cache
remain stable : number of blocks in memory per object does not  change
a lot once application have run long enough.

Those are good stats to automaticaly adjust random_page_cost and
seq_page_cost per per table or index. DBA provide accurate (with the
hardware) random_page_cost and seq_page_cost , perhaps we may want a
mem_page_cost (?). Or we just adjust rand_page_cost and seq_page_cost
based on the average data in cache.
Actually I think that updating *_page_cost and keeping the current
design of effective_cache_size (in costsize.c) may rock enough.

>
> BTW, it seems that all these variants have an implicit assumption that
> if you're reading a small part of the table it's probably part of the
> working set; which is an assumption that could be 100% wrong.  I don't
> see a way around it without trying to characterize the data access at
> an unworkably fine level, though.

Exactly.

>
>                        regards, tom lane
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/ ;    PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance