Re: random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1

"Heikki Linnakangas" <heikki@xxxxxxxxxxxxxxxx> · Tue, 11 Sep 2007 10:00:46 +0100

Luke Lonergan wrote:
> For plans that qualify with the above conditions, the executor will issue
> blocking calls to lseek(), which will translate to a single disk actuator
> moving to the needed location in seek_time, approximately 8ms. 

I doubt it's actually the lseeks, but the reads/writes after the lseeks
that block.

> If we implement AIO and allow for multiple pending I/Os used to prefetch
> groups of qualifying tuples, basically a form of random readahead, we can
> improve the throughput for any given query by taking advantage of multiple
> disk actuators.  

Rather than jumping to AIO, which is a huge change, I think we could get
much of the benefit by using posix_fadvise(WILLNEED) in strategic places
to tell the OS what pages we're going to need in the near future. If the
OS has implemented that properly, it should schedule I/Os for the
requested pages ahead of time. That would require very little change to
PostgreSQL code, and could simply be #ifdef'd away on platforms that
don't support posix_fadvise.

> Note that
> the same approach would also work to speed sequential access by overlapping
> compute and I/O.

Yes, though the OS should already doing read ahead for us. How efficient
it is is another question. posix_fadvise(SEQUENTIAL) could be used to
give a hint on that as well.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend