Re: Planner features, discussion

Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> · Tue, 13 Jul 2010 20:24:58 +0800

On 13/07/10 19:49, pasman pasmański wrote:
> Hello.
> 
> I propose 2 features for planner:
> 
> 1. Planner will estimate 2 x statistics: time of query with cache empty
> and with cache filled.

How would it know what is in cache and how long it'd take to fetch
things into cache that aren't already there?

Which cache(s)? shared_buffers? The OS cache of file system data? RAID
controller read caches? Disk drive read caches? Caches of a big SAN
iSCSI/FC target?

PostgreSQL relies on the OS and hardware to take care of all that, it
just asks for the data. It has very little idea how much of that data is
how "close" in caching terms, and how long it'll take to retrieve. The
very blunt instrument "effective_cache_size" and the random/sequential
IO cost knobs provide only the vaguest and most unrealistic guidance.

This might sound weird, but you need to realize that the planner needs
to keep things somewhat simple to be fast, and it *can't* try to keep
track of and simulate all levels of the system's caching behaviours.
Especially since some levels of caching may be completely invisible not
only to PostgreSQL, but even to the OS.

Consider a PostgreSQL instance running in a virtual machine, where the
storage is provided by a fibre channel SAN. The guest OS doesn't even
know it's virtualized - and that the host OS is caching its disk in RAM.
The host OS doesn't know that the SAN that's backing the disk has oodles
of RAM in which it's doing its own caching.

How is PostgreSQL supposed to figure out what's in cache?

> 2. Two levels of plannig: standard and long.
> Long planning may be used when standard optimization
> generate slow plan, and may use advanced algebraic transformations:

I do like the idea of being able to tell the planner "put more time into
trying to transform this query in ways that might make it faster".

However, that does mean two modes to be debugged, and a less-used "long"
mode of greater complexity to be maintained. Who's going to do the
coding to build and maintain this? What specific optimizations would you
propose for the "long" mode? I'm not sure I understand the one you did
propose as it applies to SQL ... wouldn't it involve a lot more
knowledge of what the two functions in question do than the planner ever
has?

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general