Re: RAID arrays and performance

Mark Mielke <mark@xxxxxxxxxxxxxx> · Tue, 04 Dec 2007 17:14:55 -0500

James Mansion wrote:
Mark Mielke wrote:
At a minimum, this breaks your query into: 1) Preload all the index 
pages you will need
Isn't this fairly predictable - the planner has chosen the index so it 
will be operating
on a bounded subset.
What is the bounded subset? It is bounded by the value. What value? You 
need to access the first page before you know what the second page is. 
PostgreSQL or the kernel should already have the hottest pages in 
memory, so the value of doing async I/O is very likely the cooler pages 
that are unique to the query. We don't know what the cooler pages are 
until we follow three tree down.

, 2) Scan the index pages you needed
Yes, and AIO helps when you can scan them in arbitrary order, as they 
are returned.

I don't think you are talking about searching a B-Tree, as the order is 
important when searching, and search performance would be reduced if one 
reads and scans more pages than necessary to map from the value to the 
row. I presume you are talking about scanning the entire index. Where 
"needed" means "all". Again, this only benefits a subset of the queries.

, 3) Preload all the table page you will need
No - just request that they load.  You can do work as soon as any page 
is returned.

The difference between preload and handling async I/O in terms of 
performance is debatable. Greg reports that async I/O on Linux sucks, 
but posix_fadvise*() has substantial benefits. posix_fadvise*() is 
preload not async I/O (he also reported that async I/O on Solaris seems 
to work well). Being able to do work as the first page is available is a 
micro-optimization as far as I am concerned at this point (one that may 
not yet work on Linux), as the real benefit comes from utilizing all 12 
disks in Matthew's case, not from guaranteeing that data is processed as 
soon as possible.

, 4) Scan the table pages you needed.
In the order that is most naturally returned by the disks.

Micro-optimization.

But do you really need the whole index? 
I don't think I suggested that.
What if you only need parts of the index, and this plan now reads the 
whole index using async I/O "just in case" it is useful?
Where did you get that from?

I get it from your presumption that you can know which pages of the 
index to load in advance. The only way you can know which pages must be 
loaded, is to know that you need to query them all. Unless you have some 
way of speculating with some degree of accuracy which colder pages in 
the index you will need, without looking at the index?

index scan into a regular B-Tree scan, which is difficult to perform 
async I/O for, as you don't necessarily know which pages to read next.
Most B-trees are not so deep.  It would generally be a win to retain 
interior nodes of indices in
shared memory, even if leaf pages are not present.  In such a case, it 
is quite quick to establish
which leaf pages must be demand loaded.

This is bogus. The less deep the B-Tree is, the less there should be any 
requirement for async I/O. Hot index pages will be in cache.

I'm not suggesting that Postgres indices are structured in a way that 
would support this sort
of thing now.

In your hand waving, you have assumed that PostgreSQL B-Tree index might 
need to be replaced? :-)

Cheers,
mark

--
Mark Mielke <mark@xxxxxxxxx>

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings