On 2/17/09 11:52 PM, "Rajesh Kumar Mallah" <mallah.rajesh@xxxxxxxxx> wrote:
the raid10 voulme was benchmarked againTry 4096 or 8192 (or just to see, 32768), you should get numbers very close to a raw partition with xfs with a sufficient readahead value. It is controller dependant for sure, but I usually see a “small peak” in performance at 512 or 1024, followed by a dip, then a larger peak and plateau at somewhere near # of drives * the small peak. The higher quality the controller, the less you need to fiddle with this.
taking in consideration above points
Effect of ReadAhead Settings
disabled,256(default) , 512,1024
xfs_ra0 414741 , 66144
xfs_ra256 403647, 545026 all tests on sda6
xfs_ra512 411357, 564769
xfs_ra1024 404392, 431168
looks like 512 was the best setting for this controller
I use a script that runs fio benchmarks with the following profiles with readahead values from 128 to 65536. The single reader STR test peaks with a smaller readahead value than the concurrent reader one (2 ot 8 concurrent sequential readers) and the mixed random/sequential read loads become more biased to sequential transfer (and thus, higher overall throughput in bytes/sec) with larger readahead values. The choice between the cfq and deadline scheduler however will affect the priority of random vs sequential reads more than the readahead — cfq favoring random access due to dividing I/O by time slice.
The FIO profiles I use for benchmarking are at the end of this message.
For SAS drives, its typically a ~15% to 25% degradation (the last 5% is definitely slow). For SATA 3.5” drives the last 5% is 50% the STR as the front.
Considering these two figures
xfs25 350661, 474481 (/dev/sda7)
25xfs 404291 , 547672 (/dev/sda6)
looks like the beginning of the drives are 15% faster
than the ending sections , considering this is it worth
creating a special tablespace at the begining of drives
Graphs about half way down this page show what it looks like for a typical SATA drive: http://www.tomshardware.com/reviews/Seagate-Barracuda-1-5-TB,2032-5.html
And a couple figures for some SAS drives here http://www.storagereview.com/ST973451SS.sr?page=0%2C1
FIO benchmark profile examples (long, posting here for the archives):
>
> If testing STR, you will also want to tune the block device read ahead value (example: /sbin/blockdev -getra
> /dev/sda6). This has very large impact on sequential transfer performance (and no impact on random access). >How large of an impact depends quite a bit on what kernel you're on since the readahead code has been getting >better over time and requires less tuning. But it still defaults out-of-the-box to more optimal settings for a single >drive than RAID.
> For SAS, try 256 or 512 * the number of effective spindles (spindles * 0.5 for raid 10). For SATA, try 1024 or >2048 * the number of effective spindles. The value is in blocks (512 bytes). There is documentation on the >blockdev command, and here is a little write-up I found with a couple web searches:
>http://portal.itauth.com/2007/11/20/howto-linux-double-your-disk-read-performance-single-command
*Read benchmarks, sequential:
[read-seq]
; one sequential reader reading one 64g file
rw=read
size=64g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
numjobs=1
nrfiles=1
runtime=1m
group_reporting=1
exec_prerun=echo 3 > /proc/sys/vm/drop_caches
[read-seq]
; two sequential readers, each concurrently reading a 32g file, for a total of 64g max
rw=read
size=32g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
numjobs=2
nrfiles=1
runtime=1m
group_reporting=1
exec_prerun=echo 3 > /proc/sys/vm/drop_caches
[read-seq]
; eight sequential readers, each concurrently reading a 8g file, for a total of 64g max
rw=read
size=8g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
numjobs=8
nrfiles=1
runtime=1m
group_reporting=1
exec_prerun=echo 3 > /proc/sys/vm/drop_caches
*Read benchmarks, random 8k reads.
[read-rand]
; random access on 2g file by single reader, best case scenario.
rw=randread
size=2g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
numjobs=1
nrfiles=1
group_reporting=1
runtime=1m
exec_prerun=echo 3 > /proc/sys/vm/drop_caches
[read-rand]
; 8 concurrent random readers each to its own 1g file
rw=randread
size=1g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
numjobs=8
nrfiles=1
group_reporting=1
runtime=1m
exec_prerun=echo 3 > /proc/sys/vm/drop_caches
*Mixed Load:
[global]
; one random reader concurrently with one sequential reader.
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
runtime=1m
exec_prerun=echo 3 > /proc/sys/vm/drop_caches
[seq-read]
rw=read
size=64g
numjobs=1
nrfiles=1
[read-rand]
rw=randread
size=1g
numjobs=1
nrfiles=1
[global]
; Four sequential readers concurrent with four random readers
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
runtime=1m
group_reporting=1
exec_prerun=echo 3 > /proc/sys/vm/drop_caches
[read-seq]
rw=read
size=8g
numjobs=4
nrfiles=1
[read-rand]
rw=randread
size=1g
numjobs=4
nrfiles=1
*Write tests
[write-seq]
rw=write
size=32g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
iodepth=1
numjobs=1
nrfiles=1
runtime=1m
group_reporting=1
end_fsync=1
[write-rand]
rw=randwrite
size=32g
directory=/data/test
fadvise_hint=0
blocksize=8k
direct=0
ioengine=sync
; overwrite= 1 is MANDATORY for xfs, otherwise the writes are sparse random writes and can slow performance to near zero. Postgres only does random re-writes, never sparse random writes.
overwrite=1
iodepth=1
numjobs=1
nrfiles=1
group_reporting=1
runtime=1m
end_fsync=1;