Hi Xiaxi, Thanks your answer !! FIO test: 4MB Sequential write (numjobs=1) : 203 MB/s (close to rados bench write) 4MB Random write (numjobs=8): 145 MB/s but I still have some questions of write performance According to this message: http://www.mail-archive.com/ceph-devel@xxxxxxxxxxxxxxx/msg06809.html when the client sends the request: 1) Go through client side processing. 2) Travel over the IP network to the destination OSD. 3) Go through all of the queue processing code on the OSD. 4a) Write the data to the journal (Or the faster of the journal/data disk when using btrfs. Note: The journal writes may stall if the data disk is too slow and the journal has gotten sufficiently ahead of it) 4b) Complete replication to other OSDs based on the pool's replication level and the placement group the data gets put in. (basically steps 1,2,3,4a and 5 all over again with the OSD as the client). 5) Send the Ack back to the client over the IP network ---------------- The write performance of RDB should be depend on step 4a) ~ 4b) right ? I am confused why the sequential write bandwidth stuck on 213~220 MB/s, but the write throughput of journal SSD and data disk are not full (SSD about ~120MB/s, HDD~80MB/s) - Kelvin -----Original Message----- 1.The rados bench write is "creating object",so It's more like sequential rather than random. If you use XFS with default mkfs parameters, you can use FIO to generate random 4M write on top of your disk,this share a similar access pattern with Ceph 2. rados bench cannot scale well for large concurrents, you could use a lower concurrent(say 32 or 64),but multi rados bench instances to walk around this issue 3. the performance result seems fair,you can refer to the ceph official blog, If my memory is corret,Mark have seen a similar performance with you 发自我的 iPhone 在 2013-4-6,13:12,"Kelvin_Huang@xxxxxxxxxx<mailto:Kelvin_Huang@xxxxxxxxxx>" <Kelvin_Huang@xxxxxxxxxx<mailto:Kelvin_Huang@xxxxxxxxxx>> 写道: Hi all, I have some problem after my RBD performance test Setup: Linux kernel: 3.6.11 OS: Ubuntu 12.04 RAID card: LSI MegaRAID SAS 9260-4i For every HDD: RAID0, Write Policy: Write Back with BBU, Read Policy: ReadAhead, IO Policy: Direct Storage server number : 1 Storage server : 8 * HDD (each storage server has 8 osd, 7200 rpm, 2T) 4 * SSD (2 osd use 1 SSD as journal, the SSD divided into two partition sdx1, sdx2) Ceph version : 0.56.4 Replicas : 2 Monitor number:1 The write speed of HDD: # dd if=/dev/zero of=/dev/sdd bs=1024k count=10000 oflag=direct 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 69.3961 s, 151 MB/s The write speed of SSD: # dd if=/dev/zero of=/dev/sdb bs=1024k count=10000 oflag=direct 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 40.8671 s, 257 MB/s Then we use the RADOS benchmark and collectl to observed write performance #rados -p rbd bench 300 write -t 256 2013-04-05 14:31:13.732737min lat: 4.28207 max lat: 5.92085 avg lat: 4.78598 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 300 256 16043 15787 210.455 196 5.91 4.78598 Total time run: 300.588962 Total writes made: 16043 Write size: 4194304 Bandwidth (MB/sec): 213.488 Stddev Bandwidth: 40.6795 Max bandwidth (MB/sec): 288 Min bandwidth (MB/sec): 0 Average Latency: 4.75647 Stddev Latency: 0.37182 Max latency: 5.93183 Min latency: 0.590936 collectl on OSDs : #collectl --iosize -sCDN --dskfilt "sd(c|d|e|f|g|h|i|j)" # DISK STATISTICS (/sec) # <---------reads---------><---------writes---------><--------averages--------> Pct #Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util sdc 0 0 0 0 76848 563 460 167 167 12 26 0 42 sdd 0 0 0 0 45100 0 165 273 273 6 36 1 30 sde 0 0 0 0 73800 0 270 273 273 3 14 1 41 sdf 0 0 0 0 73800 0 270 273 273 17 64 1 33 sdg 0 0 0 0 41000 0 150 273 273 1 7 0 10 sdh 0 0 0 0 57400 0 210 273 273 4 20 1 27 sdi 0 0 0 0 36904 0 136 271 271 0 5 0 7 sdj 0 0 0 0 77776 0 285 273 272 28 87 1 48 collectl on SSDs : #collectl --iosize -sCDN --dskfilt "sd(b|k|l|m)" # DISK STATISTICS (/sec) # <---------reads---------><---------writes---------><--------averages--------> Pct #Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util sdb 0 0 0 0 115552 0 388 298 297 75 159 2 77 sdk 0 0 0 0 114592 0 389 295 294 12 33 0 38 sdl 0 0 0 0 100364 0 334 300 300 35 148 2 69 sdm 0 0 0 0 101644 0 345 295 294 245 583 2 99 <= almost 99% My question is: 1.The rados benchmark write is a random write right? 2.Why the bottleneck of write bandwidth occur at 213MB/s even if increased the concurrent (-t 512) ? It looks a bit worse, because the collectl show SSD's write throughput only has 100M~120M, but SSD should be able to 250MB/s 3.Why some SSD (sdm) [Util] almost 99% that means data written to osd not enough distributed ? 4.If bottleneck of write performance not SSD , What it should be write bottleneck ? 5.How can I improve write performance ? Thanks!! - Kelvin _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com