Re: RBD performance test (write) problem

<Kelvin_Huang@xxxxxxxxxx> · Wed, 10 Apr 2013 02:42:48 +0000

Hi Xiaxi,
Thanks your answer !! 

FIO test:
4MB Sequential write (numjobs=1) : 203 MB/s (close to rados bench write)
4MB Random write (numjobs=8): 145 MB/s

but I still have some questions of write performance

According to this message:
http://www.mail-archive.com/ceph-devel@xxxxxxxxxxxxxxx/msg06809.html

when the client sends the request:
1) Go through client side processing.
2) Travel over the IP network to the destination OSD.
3) Go through all of the queue processing code on the OSD.
4a) Write the data to the journal (Or the faster of the journal/data disk when using btrfs. Note: The journal writes may stall if the data disk is too slow and the journal has gotten sufficiently ahead of it) 
4b) Complete replication to other OSDs based on the pool's replication level and the placement group the data gets put in. (basically steps 1,2,3,4a and 5 all over again with the OSD as the client).
5) Send the Ack back to the client over the IP network

----------------

The write performance of RDB should be depend on step 4a) ~ 4b) right ?
I am confused why the sequential write bandwidth stuck on 213~220 MB/s, but the write throughput of journal SSD and data disk are not full (SSD about ~120MB/s, HDD~80MB/s)

- Kelvin

-----Original Message-----

1.The rados bench write is "creating object",so It's more like sequential rather than random. If you use XFS with default mkfs parameters, you can use FIO to generate random 4M write on top of your disk,this share a similar access pattern with Ceph

2. rados bench cannot scale well for large concurrents, you could use a lower concurrent(say 32 or 64),but multi rados bench instances to walk around this issue

3. the performance result seems fair,you can refer to the ceph official blog, If my memory is corret,Mark have seen a similar performance with you

发自我的 iPhone

在 2013-4-6，13:12，"Kelvin_Huang@xxxxxxxxxx<mailto:Kelvin_Huang@xxxxxxxxxx>" <Kelvin_Huang@xxxxxxxxxx<mailto:Kelvin_Huang@xxxxxxxxxx>> 写道：

Hi all,

I have some problem after my RBD performance test

Setup:
Linux kernel: 3.6.11
OS: Ubuntu 12.04
RAID card: LSI MegaRAID SAS 9260-4i  For every HDD: RAID0, Write Policy: Write Back with BBU, Read Policy: ReadAhead, IO Policy: Direct Storage server number : 1 Storage server :
8 * HDD (each storage server has 8 osd, 7200 rpm, 2T)
4 * SSD (2 osd use 1 SSD as journal, the SSD divided into two partition sdx1, sdx2)

Ceph version : 0.56.4
Replicas : 2
Monitor number:1

The write speed of HDD:
# dd if=/dev/zero of=/dev/sdd bs=1024k count=10000 oflag=direct
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 69.3961 s, 151 MB/s

The write speed of SSD:
# dd if=/dev/zero of=/dev/sdb bs=1024k count=10000 oflag=direct
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 40.8671 s, 257 MB/s

Then we use the RADOS benchmark and collectl to observed write performance

#rados -p rbd bench 300 write -t 256

2013-04-05 14:31:13.732737min lat: 4.28207 max lat: 5.92085 avg lat: 4.78598
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
   300     256     16043     15787   210.455       196      5.91   4.78598
Total time run:         300.588962
Total writes made:      16043
Write size:             4194304
Bandwidth (MB/sec):     213.488

Stddev Bandwidth:       40.6795
Max bandwidth (MB/sec): 288
Min bandwidth (MB/sec): 0
Average Latency:        4.75647
Stddev Latency:         0.37182
Max latency:            5.93183
Min latency:            0.590936

collectl on OSDs :
#collectl  --iosize -sCDN --dskfilt "sd(c|d|e|f|g|h|i|j)"

# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
sdc              0      0    0    0   76848    563  460  167     167    12    26      0   42
sdd              0      0    0    0   45100      0  165  273     273     6    36      1   30
sde              0      0    0    0   73800      0  270  273     273     3    14      1   41
sdf              0      0    0    0   73800      0  270  273     273    17    64      1   33
sdg              0      0    0    0   41000      0  150  273     273     1     7      0   10
sdh              0      0    0    0   57400      0  210  273     273     4    20      1   27
sdi              0      0    0    0   36904      0  136  271     271     0     5      0    7
sdj              0      0    0    0   77776      0  285  273     272    28    87      1   48

collectl on SSDs :
#collectl  --iosize -sCDN --dskfilt "sd(b|k|l|m)"

# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
sdb              0      0    0    0  115552      0  388  298     297    75   159      2   77
sdk              0      0    0    0  114592      0  389  295     294    12    33      0   38
sdl              0      0    0    0  100364      0  334  300     300    35   148      2   69
sdm              0      0    0    0  101644      0  345  295     294   245   583      2   99 <= almost 99%

My question is:
1.The rados benchmark write is a random write right?

2.Why the bottleneck of write bandwidth occur at 213MB/s even if increased the concurrent (-t 512) ?
  It looks a bit worse, because the collectl show SSD's write throughput only has 100M~120M, but SSD should be able to 250MB/s

3.Why some SSD (sdm) [Util] almost 99% that means data written to osd not enough distributed ?

4.If bottleneck of write performance not SSD , What it should be write bottleneck ?

5.How can I improve write performance ?

Thanks!!

- Kelvin

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com