Hi Mark, Sorry, reply too late, because I didn’t receive this mail so missed this message in the several days... http://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg00624.html Your advice are very very helpful to me !!! thanks
J I have done the following test and have some questions
1)
I concurrently use dd if=/dev/zero of=/dev/sd[b,c,d,e,f ...n] bs=4096k count=10000 oflag=direct , on each SATA disk collectl show: #<----CPU[HYPER]-----><----------Disks-----------><----------Network----------> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 0 0 2935 636 0 0 866560 2708 1 9 0 1 0 0 2939 718 0 0 865620 2708 2 14 1 4 0 0 2872 631 0 0 868480 2714 1 8 0 1 0 0 2937 621 0 0 864640 2702 1 9 0 4 total write throughput about 860MB/s use RADOS bench : rados -p rbd bench 300 write -t 256 collectl show: #<----CPU[HYPER]-----><----------Disks-----------><----------Network----------> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 22 10 6991 17947 4 1 999K 3111 4 31 48 22 18 8 6151 16116 0 0 1003K 2858 8 40 23 37 19 9 6295 16031 8 2 1002K 2458 2 22 44 17 total write throughput about 1000MB/s the expander backplane running at 3.0Gb/s and 4-lane Mini-SAS port to connect : 4 * 3Gb/s = 12Gb/s ~= 1GB/s, so I think write throughput stuck on 1000MB/s due to expander backplane that is bottleneck for sequential writes. If expander backplane can running at 6.0Gb/s then total write throughput will increase right?
2)
OSDs & journal setting: a. OSDs filesystems are EXT4 , no use osd mkfs options osd mkfs type = ext4 osd mount options ext4 = rw,data=""> filestore_xattr_use_omap = true b. SSDs journal are raw disk that don't has filesystem and divided into two partition (alignment) LSI MegaRAID SAS 9260-4i setting: a. every HDD : RAID0 , Write Policy: Write Back with BBU, Read Policy: ReadAhead, IO Policy: Direct, Disk cache: unchanged b. every SSD : RAID0 , Write Policy: Write Through, Read Policy: NoReadAhead, IO Policy: Direct, Disk cache: disabled Because the last result are pool size=576, so i did a new test for pool size=2048 and 9 OSDs + 4 SSDs configuration !! Read: rados -p testpool bench 300 seq -t 256 Write: rados -p testpool bench 300 write -t 256 --no-cleanup Rados Bench TEST (Read): 2x replication & 12 OSDs case: Bandwidth (MB/sec): Bandwidth (MB/sec): 1373.013
1x replication & 12 OSDs case: Bandwidth (MB/sec): Bandwidth (MB/sec): 1478.694 2x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec): 1442.543 1x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec): 1448.407 2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec): 1485.175 1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec): 447.245 Rados Bench TEST (Write): 2x replication & 12 OSDs case: Bandwidth (MB/sec): 228.064 1x replication & 12 OSDs case: Bandwidth (MB/sec): 457.213 2x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec): 224.430 1x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec): 482.104 2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec): 239 1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec): 485 In Rados Bench TEST (Read), I originally expected more OSDs can increase the read bandwidth but the results show about 1400MB/s on most case, this is cache intervention? because i didn't see any read operation on disk
... In Rados Bench TEST (Write), I test 9 OSDs + 3 SSDs (Journal) configuration and observing by collectl # DISK STATISTICS (/sec) # <---------reads---------><---------writes---------><--------averages--------> Pct #Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util sdc 0 0 0 0 47336 382 335 141 141 0 2 0 7 sdd 0 0 0 0 65600 0 240 273 273 1 4 0 7 sde 0 0 0 0 56440 0 207 273 272 64 342 4 99 sdg 0 0 0 0 43544 450 326 134 133 39 135 3 99 sdf 0 0 0 0 65600 0 240 273 273 0 2 0 7 sdh 0 0 0 0 57400 0 210 273 273 0 2 0 7 sdi 0 0 0 0 69012 227 251 275 274 90 560 3 99 sdj 0 0 0 0 66944 424 308 217 217 1 5 0 7 sdb 0 0 0 0 43496 0 159 274 273 6 41 3 51 # DISK STATISTICS (/sec) # <---------reads---------><---------writes---------><--------averages--------> Pct #Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util sdk 0 0 0 0 178016 0 599 297 297 219 390 1 99 sdl 0 0 0 0 156540 0 536 292 292 27 51 1 96 sdm 0 0 0 0 166724 0 578 288 288 31 42 1 95 The write throughput of SSDs are increase to about ~170MB/s that may represent three SSDs for journal enough
but I don't understand why 9 OSDs + 3 SSDs can has more write bandwidth than 8 OSDs + 4 SSDs ... ? Another question is 1x replication has 490MB/s bandwidth on a storage server, that mean if I have two storage server and 2x replication on each storage server, then write bandwidth can reach 490MB/s ?
3)
I use HBA (LSI 9211-8i : LSISAS2008) instead of "smart" RAID card (LSI 9260-4i) 2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec): 247.047 1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec): 492.798 The result looks little better than use RAID card !!
4)
XFS: osd mkfs type = xfs osd mkfs options xfs = -f -i size=2048 osd mount options xfs = rw,inode64,noatime filestore_xattr_use_omap = true 2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec): 248.327 1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec): 494.292 #<----CPU[HYPER]-----><----------Disks-----------><----------Network----------> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 22 9 6416 21901 0 0 1004K 2514 10 39 113 37 20 9 6147 20391 0 0 1009K 2565 3 39 62 32 19 8 6488 19780 0 0 1006K 2448 7 31 21 23 21 10 6730 19695 0 0 1003K 2430 5 16 1 11 20 9 6755 19244 0 0 994K 2491 3 15 25 12 Total write throughput also stuck on about 1GB/s, I think the same reason as 1) BTRFS: osd mkfs type = btrfs osd mkfs options btrfs = -l 16k -n 16k osd mount options xfs = -o noatime 2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec): 243.817 1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec): 487.809 The results are same as EXT4 / XFS Next time I will try to use 2nd controller and directly connect my disks and SSDs
Thanks !! J
-
Kelvin |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com