Re: random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1

Jean-David Beyer <jeandavid8@xxxxxxxxxxx> · Tue, 11 Sep 2007 07:07:02 -0400

Gregory Stark wrote (in part):

> The extra spindles speed up sequential i/o too so the ratio between sequential
> and random with prefetch would still be about 4.0. But the ratio between
> sequential and random without prefetch would be even higher.
> 
I never figured out how extra spindles help sequential I-O because
consecutive logical blocks are not necessarily written consecutively in a
Linux or UNIX file system. They try to group a bunch (8 512-bit?) of blocks
together, but that is about it. So even if you are reading sequentially, the
head actuator may be seeking around anyway. I suppose you could fix this, if
the database were reasonably static, by backing up the entire database,
doing a mkfs on the file system, and restoring it. This might make the
database more contiguous, at least for a while.

When I was working on a home-brew UNIX dbms, I used raw IO on a separate
disk drive so that the files could be contiguous, and this would work.
Similarly, IBM's DB2 does that (optionally). But it is my understanding that
postgreSQL does not. OTOH, the large (in my case) cache in the kernel can be
helpful if I seek around back and forth to nearby records since they may be
in the cache. On my 8 GByte RAM, I have the shared buffers set to 200,000
which should keep any busy stuff in memory, and there are about 6 GBytes of
ram presently available for the system I-O cache. I have not optimized
anything yet because I am still laundering the major input data to
initialize the database so I do not have any real transactions going through
it yet.

I have 6 SCSI hard drives on two Ultra/320 SCSI controllers. Of the database
partitions, sda8 has the write-ahead-log, sdb7 has a few tiny seldom-used
tables and pg_log, and sdc1, sdd1, sde1, and sdf1 are just for the other
tables. For the data on sd[c-f]1 (there is nothing else on these drives), I
keep the index for a table on a different drive from the data. When
populating the database initially, this seems to help since I tend to fill
one table, or a very few tables, at a time, so the table itself and its
index do not contend for the head actuator. Presumably, the SCSI controllers
can do simultaneous seeks on the various drives and one transfer on each
controller.

When loading the database (using INSERTs mainly -- because the input data
are gawdawful unnormalized spreadsheets obtained from elsewhere, growing
once a week), the system is IO limited with seeks (and rotational latency
time). IO transfers average about 1.7 Megabytes/second, although there are
peaks that exceed 10 Megabytes/second. If I run pg_restore from a backup
tape, I can see 90 Megabyte/second transfer rates for bursts of several
seconds at a time, but that is pretty much of a record.

-- 
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 06:35:01 up 33 days, 9:57, 0 users, load average: 4.06, 4.07, 4.02

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster