Re: RBD performance test (write) problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mark,

 

Sorry, reply too late, because I didn’t receive this mail so missed this message in the several days...

http://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg00624.html

 

 

Your advice are very very helpful to me !!! thanks J

 

I have done the following test and have some questions

 

1)      I concurrently use dd if=/dev/zero of=/dev/sd[b,c,d,e,f ...n] bs=4096k count=10000 oflag=direct , on each SATA disk

 

collectl show:

#<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->

#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut

   0   0  2935    636      0      0 866560   2708      1      9      0       1

   0   0  2939    718      0      0 865620   2708      2     14      1       4

   0   0  2872    631      0      0 868480   2714      1      8      0       1

   0   0  2937    621      0      0 864640   2702      1      9      0       4

 

total write throughput about 860MB/s

 

use RADOS bench : rados -p rbd bench 300 write -t 256

collectl show:

#<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->

#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut

  22  10  6991  17947      4      1   999K   3111      4     31     48      22

  18   8  6151  16116      0      0  1003K   2858      8     40     23      37

  19   9  6295  16031      8      2  1002K   2458      2     22     44      17

 

total write throughput about 1000MB/s

 

the expander backplane running at 3.0Gb/s and 4-lane Mini-SAS port to connect : 4 * 3Gb/s = 12Gb/s ~= 1GB/s, so I think write throughput stuck on 1000MB/s due to expander backplane that is bottleneck for sequential writes.

If expander backplane can running at 6.0Gb/s then total write throughput will increase right?

 

 

 

2)       

OSDs & journal setting:

a. OSDs filesystems are EXT4  , no use osd mkfs options

osd mkfs type = ext4

osd mount options ext4 = rw,data="">

filestore_xattr_use_omap = true

 

b. SSDs journal are raw disk that don't has filesystem and divided into two partition (alignment)

 

LSI MegaRAID SAS 9260-4i setting:

a. every HDD : RAID0 , Write Policy: Write Back with BBU, Read Policy: ReadAhead, IO Policy: Direct, Disk cache: unchanged

b. every SSD  : RAID0 , Write Policy: Write Through, Read Policy: NoReadAhead, IO Policy: Direct, Disk cache: disabled

 

Because the last result are pool size=576, so i did a new test for pool size=2048 and 9 OSDs + 4 SSDs configuration !!

 

Read: rados -p testpool bench 300 seq -t 256

Write: rados  -p testpool bench 300 write -t 256 --no-cleanup

 

 

Rados Bench TEST (Read):

2x replication & 12 OSDs case: Bandwidth (MB/sec): Bandwidth (MB/sec):    1373.013

1x replication & 12 OSDs case: Bandwidth (MB/sec): Bandwidth (MB/sec):    1478.694

 

2x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec):    1442.543

1x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec):    1448.407

 

2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):    1485.175

1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):    447.245

 

 

 

Rados Bench TEST (Write):

2x replication & 12 OSDs case: Bandwidth (MB/sec):     228.064

1x replication & 12 OSDs case: Bandwidth (MB/sec):     457.213

 

2x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec):     224.430

1x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec):     482.104

 

2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):     239

1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):     485

 

In Rados Bench TEST (Read), I originally expected more OSDs can increase the read bandwidth but the results show about 1400MB/s on most case, this is cache intervention? because i didn't see any read operation on disk ...

 

In Rados Bench TEST (Write), I test 9 OSDs + 3 SSDs (Journal) configuration and observing by collectl

 

# DISK STATISTICS (/sec)

#          <---------reads---------><---------writes---------><--------averages--------> Pct

#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util

sdc              0      0    0    0   47336    382  335  141     141     0     2      0    7

sdd              0      0    0    0   65600      0  240  273     273     1     4      0    7

sde              0      0    0    0   56440      0  207  273     272    64   342      4   99

sdg              0      0    0    0   43544    450  326  134     133    39   135      3   99

sdf              0      0    0    0   65600      0  240  273     273     0     2      0    7

sdh              0      0    0    0   57400      0  210  273     273     0     2      0    7

sdi              0      0    0    0   69012    227  251  275     274    90   560      3   99

sdj              0      0    0    0   66944    424  308  217     217     1     5      0    7

sdb              0      0    0    0   43496      0  159  274     273     6    41      3   51

 

# DISK STATISTICS (/sec)

#          <---------reads---------><---------writes---------><--------averages--------> Pct

#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util

sdk              0      0    0    0  178016      0  599  297     297   219   390      1   99

sdl              0      0    0    0  156540      0  536  292     292    27    51      1   96

sdm              0      0    0    0  166724      0  578  288     288    31    42      1   95

 

The write throughput of SSDs are increase to about ~170MB/s that may represent three SSDs for journal enough

but I don't understand why 9 OSDs + 3 SSDs can has more write bandwidth than 8 OSDs + 4 SSDs ... ?

 

Another question is 1x replication has 490MB/s bandwidth on a storage server, that mean if I have two storage server and 2x replication on each storage server, then write bandwidth can reach 490MB/s ?

 

 

3)       

I use HBA (LSI 9211-8i : LSISAS2008) instead of "smart" RAID card (LSI 9260-4i)

 

2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):     247.047

1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):     492.798

 

The result looks little better than use RAID card !!

 

4)       

XFS:

osd mkfs type = xfs

osd mkfs options xfs = -f -i size=2048

osd mount options xfs = rw,inode64,noatime

filestore_xattr_use_omap = true

 

2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):    248.327

1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):    494.292

 

#<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->

#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut

  22   9  6416  21901      0      0  1004K   2514     10     39    113      37

  20   9  6147  20391      0      0  1009K   2565      3     39     62      32

  19   8  6488  19780      0      0  1006K   2448      7     31     21      23

  21  10  6730  19695      0      0  1003K   2430      5     16      1      11

  20   9  6755  19244      0      0   994K   2491      3     15     25      12

 

Total write throughput also stuck on about 1GB/s, I think the same reason as 1)

 

 

BTRFS:

osd mkfs type = btrfs

osd mkfs options btrfs = -l 16k -n 16k

osd mount options xfs = -o noatime

 

2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):     243.817

1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):     487.809

 

The results are same as EXT4 / XFS

 

Next time I will try to use 2nd controller and directly connect my disks and SSDs

 

Thanks !! J

-          Kelvin

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux