Re: RBD performance test (write) problem

<Kelvin_Huang@xxxxxxxxxx> · Sun, 14 Apr 2013 10:29:29 +0000

Hi Mark,

Sorry, reply too late, because I didn’t receive this mail so missed this message in the several days...
http://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg00624.html

Your advice are very very helpful to me !!! thanks
J

I have done the following test and have some questions

1)     
I concurrently use dd if=/dev/zero of=/dev/sd[b,c,d,e,f ...n] bs=4096k count=10000 oflag=direct , on each SATA disk

collectl show:
#<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut
   0   0  2935    636      0      0 866560   2708      1      9      0       1
   0   0  2939    718      0      0 865620   2708      2     14      1       4
   0   0  2872    631      0      0 868480   2714      1      8      0       1
   0   0  2937    621      0      0 864640   2702      1      9      0       4

total write throughput about 860MB/s 

use RADOS bench : rados -p rbd bench 300 write -t 256
collectl show:
#<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut
  22  10  6991  17947      4      1   999K   3111      4     31     48      22
  18   8  6151  16116      0      0  1003K   2858      8     40     23      37
  19   9  6295  16031      8      2  1002K   2458      2     22     44      17

total write throughput about 1000MB/s 

the expander backplane running at 3.0Gb/s and 4-lane Mini-SAS port to connect : 4 * 3Gb/s = 12Gb/s ~= 1GB/s, so I think write throughput stuck on 1000MB/s due to expander backplane that is bottleneck for sequential writes.
If expander backplane can running at 6.0Gb/s then total write throughput will increase right?

2)     

OSDs & journal setting:
a. OSDs filesystems are EXT4  , no use osd mkfs options
osd mkfs type = ext4
osd mount options ext4 = rw,data="">
filestore_xattr_use_omap = true

b. SSDs journal are raw disk that don't has filesystem and divided into two partition (alignment)

LSI MegaRAID SAS 9260-4i setting:
a. every HDD : RAID0 , Write Policy: Write Back with BBU, Read Policy: ReadAhead, IO Policy: Direct, Disk cache: unchanged
b. every SSD  : RAID0 , Write Policy: Write Through, Read Policy: NoReadAhead, IO Policy: Direct, Disk cache: disabled

Because the last result are pool size=576, so i did a new test for pool size=2048 and 9 OSDs + 4 SSDs configuration !!

Read: rados -p testpool bench 300 seq -t 256
Write: rados  -p testpool bench 300 write -t 256 --no-cleanup

Rados Bench TEST (Read):
2x replication & 12 OSDs case: Bandwidth (MB/sec): Bandwidth (MB/sec):    1373.013

1x replication & 12 OSDs case: Bandwidth (MB/sec): Bandwidth (MB/sec):    1478.694

2x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec):    1442.543
1x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec):    1448.407

2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):    1485.175
1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):    447.245

Rados Bench TEST (Write):
2x replication & 12 OSDs case: Bandwidth (MB/sec):     228.064
1x replication & 12 OSDs case: Bandwidth (MB/sec):     457.213

2x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec):     224.430
1x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec):     482.104

2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):     239
1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):     485

In Rados Bench TEST (Read), I originally expected more OSDs can increase the read bandwidth but the results show about 1400MB/s on most case, this is cache intervention? because i didn't see any read operation on disk
 ...

In Rados Bench TEST (Write), I test 9 OSDs + 3 SSDs (Journal) configuration and observing by collectl

# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
sdc              0      0    0    0   47336    382  335  141     141     0     2      0    7
sdd              0      0    0    0   65600      0  240  273     273     1     4      0    7
sde              0      0    0    0   56440      0  207  273     272    64   342      4   99
sdg              0      0    0    0   43544    450  326  134     133    39   135      3   99
sdf              0      0    0    0   65600      0  240  273     273     0     2      0    7
sdh              0      0    0    0   57400      0  210  273     273     0     2      0    7
sdi              0      0    0    0   69012    227  251  275     274    90   560      3   99
sdj              0      0    0    0   66944    424  308  217     217     1     5      0    7
sdb              0      0    0    0   43496      0  159  274     273     6    41      3   51

# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
sdk              0      0    0    0  178016      0  599  297     297   219   390      1   99
sdl              0      0    0    0  156540      0  536  292     292    27    51      1   96
sdm              0      0    0    0  166724      0  578  288     288    31    42      1   95

The write throughput of SSDs are increase to about ~170MB/s that may represent three SSDs for journal enough

but I don't understand why 9 OSDs + 3 SSDs can has more write bandwidth than 8 OSDs + 4 SSDs ... ?

Another question is 1x replication has 490MB/s bandwidth on a storage server, that mean if I have two storage server and 2x replication on each storage server, then write bandwidth can reach 490MB/s ?

3)     

I use HBA (LSI 9211-8i : LSISAS2008) instead of "smart" RAID card (LSI 9260-4i)

2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):     247.047
1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):     492.798

The result looks little better than use RAID card !!

4)     

XFS: 
osd mkfs type = xfs
osd mkfs options xfs = -f -i size=2048
osd mount options xfs = rw,inode64,noatime
filestore_xattr_use_omap = true

2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):    248.327
1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):    494.292

#<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut
  22   9  6416  21901      0      0  1004K   2514     10     39    113      37
  20   9  6147  20391      0      0  1009K   2565      3     39     62      32
  19   8  6488  19780      0      0  1006K   2448      7     31     21      23
  21  10  6730  19695      0      0  1003K   2430      5     16      1      11
  20   9  6755  19244      0      0   994K   2491      3     15     25      12

Total write throughput also stuck on about 1GB/s, I think the same reason as 1)

BTRFS: 
osd mkfs type = btrfs
osd mkfs options btrfs = -l 16k -n 16k
osd mount options xfs = -o noatime

2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):     243.817
1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):     487.809

The results are same as EXT4 / XFS

Next time I will try to use 2nd controller and directly connect my disks and SSDs

Thanks !! J

-         
Kelvin

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com