Re: suggestions for postgresql setup on Dell 2950 , PERC6i controller

Scott Carey <scott@xxxxxxxxxxxxxxxxx> · Wed, 18 Feb 2009 11:26:58 -0800

Title: Re:  suggestions for postgresql setup on Dell 2950 , PERC6i  controller

On 2/17/09 11:52 PM, "Rajesh Kumar Mallah" <mallah.rajesh@xxxxxxxxx> wrote:

the raid10 voulme was benchmarked again

taking in consideration above points

Effect of ReadAhead Settings

disabled,256(default) , 512,1024

xfs_ra0                 414741 ,   66144

xfs_ra256            403647,  545026                 all tests on sda6

xfs_ra512            411357,  564769

xfs_ra1024          404392,  431168

looks like 512 was the best setting for this controller

Try 4096 or 8192 (or just to see, 32768), you should get numbers very close to a raw partition with xfs with a sufficient readahead value.  It is controller dependant for sure, but I usually see a “small peak” in performance at 512 or 1024, followed by a dip, then a larger peak and plateau at somewhere near # of drives * the small peak.  The higher quality the controller, the less you need to fiddle with this.

I use a script that runs fio benchmarks with the following profiles with readahead values from 128 to 65536.  The single reader STR test peaks with a smaller readahead value than the concurrent reader one (2 ot 8 concurrent sequential readers) and the mixed random/sequential read loads become more biased to sequential transfer (and thus, higher overall throughput in bytes/sec) with larger readahead values.  The choice between the cfq and deadline scheduler however will affect the priority of random vs sequential reads more than the readahead — cfq favoring random access due to dividing I/O by time slice. 

The FIO profiles I use for benchmarking are at the end of this message.

Considering these two figures

xfs25                   350661,   474481                (/dev/sda7)

25xfs                   404291  , 547672                (/dev/sda6)

looks like the beginning of the drives are  15% faster

than the ending sections , considering this is it worth

creating a special tablespace at the begining of drives

For SAS drives, its typically a ~15% to 25% degradation (the last 5% is definitely slow).  For SATA 3.5” drives the last 5% is 50% the STR as the front. 

Graphs about half way down this page show what it looks like for a typical SATA drive: http://www.tomshardware.com/reviews/Seagate-Barracuda-1-5-TB,2032-5.html

And a couple figures for some SAS drives here http://www.storagereview.com/ST973451SS.sr?page=0%2C1

>

> If testing STR, you will also want to tune the block device read ahead value (example: /sbin/blockdev -getra

> /dev/sda6).  This has very large impact on sequential transfer performance (and no impact on random access). >How large of an impact depends quite a bit on what kernel you're on since the readahead code has been getting >better over time and requires less tuning.  But it still defaults out-of-the-box to more optimal settings for a single >drive than RAID.

> For SAS, try 256 or 512 * the number of effective spindles (spindles * 0.5 for raid 10).  For SATA, try 1024 or >2048 * the number of effective spindles.  The value is in blocks (512 bytes).  There is documentation on the >blockdev command, and here is a little write-up I found with a couple web searches:

>http://portal.itauth.com/2007/11/20/howto-linux-double-your-disk-read-performance-single-command

FIO benchmark profile examples (long, posting here for the archives):

*Read benchmarks, sequential:

[read-seq]

; one sequential reader reading one 64g file

rw=read

size=64g

directory=/data/test

fadvise_hint=0

blocksize=8k

direct=0

ioengine=sync

iodepth=1

numjobs=1

nrfiles=1

runtime=1m

group_reporting=1

exec_prerun=echo 3 > /proc/sys/vm/drop_caches

[read-seq]

; two sequential readers, each concurrently reading a 32g file, for a total of 64g max

rw=read

size=32g

directory=/data/test

fadvise_hint=0

blocksize=8k

direct=0

ioengine=sync

iodepth=1

numjobs=2

nrfiles=1

runtime=1m

group_reporting=1

exec_prerun=echo 3 > /proc/sys/vm/drop_caches

[read-seq]

; eight sequential readers, each concurrently reading a 8g file, for a total of 64g max

rw=read

size=8g

directory=/data/test

fadvise_hint=0

blocksize=8k

direct=0

ioengine=sync

iodepth=1

numjobs=8

nrfiles=1

runtime=1m

group_reporting=1

exec_prerun=echo 3 > /proc/sys/vm/drop_caches

*Read benchmarks, random 8k reads.

[read-rand]

; random access on 2g file by single reader, best case scenario.

rw=randread

size=2g

directory=/data/test

fadvise_hint=0

blocksize=8k

direct=0

ioengine=sync

iodepth=1

numjobs=1

nrfiles=1

group_reporting=1

runtime=1m

exec_prerun=echo 3 > /proc/sys/vm/drop_caches

[read-rand]

; 8 concurrent random readers each to its own 1g file

rw=randread

size=1g

directory=/data/test

fadvise_hint=0

blocksize=8k

direct=0

ioengine=sync

iodepth=1

numjobs=8

nrfiles=1

group_reporting=1

runtime=1m

exec_prerun=echo 3 > /proc/sys/vm/drop_caches

*Mixed Load:

[global]

; one random reader concurrently with one sequential reader.

directory=/data/test

fadvise_hint=0

blocksize=8k

direct=0

ioengine=sync

iodepth=1

runtime=1m

exec_prerun=echo 3 > /proc/sys/vm/drop_caches

[seq-read]

rw=read

size=64g

numjobs=1

nrfiles=1

[read-rand]

rw=randread

size=1g

numjobs=1

nrfiles=1

[global]

; Four sequential readers concurrent with four random readers

directory=/data/test

fadvise_hint=0

blocksize=8k

direct=0

ioengine=sync

iodepth=1

runtime=1m

group_reporting=1

exec_prerun=echo 3 > /proc/sys/vm/drop_caches

[read-seq]

rw=read

size=8g

numjobs=4

nrfiles=1

[read-rand]

rw=randread

size=1g

numjobs=4

nrfiles=1

*Write tests

[write-seq]

rw=write

size=32g

directory=/data/test

fadvise_hint=0

blocksize=8k

direct=0

ioengine=sync

iodepth=1

numjobs=1

nrfiles=1

runtime=1m

group_reporting=1

end_fsync=1

[write-rand]

rw=randwrite

size=32g

directory=/data/test

fadvise_hint=0

blocksize=8k

direct=0

ioengine=sync

; overwrite= 1 is MANDATORY for xfs, otherwise the writes are sparse random writes and can slow performance to near zero.  Postgres only does random re-writes, never sparse random writes.

overwrite=1

iodepth=1

numjobs=1

nrfiles=1

group_reporting=1

runtime=1m

end_fsync=1;