Hello, On Mon, 19 Dec 2016 13:29:07 +0800 (CST) 马忠明 wrote: > Hi guys, > > So recently I was testing our ceph cluster which mainly used for block usage(rbd). > > We have 30 ssd drives total(5 storage nodes,6 ssd drives each node).However the result of fio is very poor. > All relevant details are missing. SSD exact models, CPU/RAM config, network config, Ceph, OS/kernel, fio versions, the config you tested this with, as in replication. > We tested the workload on ssd pool with following parameter : > > "fio --size=50G \ > > --ioengine=rbd \ > > --direct=1 \ > > --numjobs=1 \ > > --rw=randwrite(randread) \ > > --name=com_ssd_4k_randwrite(randread) \ > > --bs=4k \ > > --iodepth=32 \ > > --pool=ssd_volumes \ > > --runtime=60 \ > > --ramp_time=30 \ > > --rbdname=4k_test_image" > > and here is the result: > > random write:4631;random read:21127 > > > > > I also tested the pool(size=1,min_size=1,pg_num=256) which is consisted by only one single ssd drive with same workload pattern which is more acceptable.(random write:8303;random read:27859) > I'm only going to comment on the write part. On my staging cluster (* see below) I ran your fio against the cache tier (so only SSDs involved) with this result: write: io=4206.3MB, bw=71784KB/s, iops=17945, runt= 60003msec slat (usec): min=0, max=531, avg= 3.26, stdev=11.33 clat (usec): min=5, max=41996, avg=1770.23, stdev=2260.61 lat (usec): min=9, max=41997, avg=1773.36, stdev=2260.60 So more than 2 times better than your non-replicated test. 4k randwrites stress the CPUs (run atop or such on your OSD nodes when doing a test run), so this might be your limit here. Along with less than optimal SSDs or a high latency network. Christian * Staging cluster: --- 4 nodes running latest Hammer under Debian Jessie (with sysvinit, kernel 4.6) and manually created OSDs. Infiniband (IPoIB) QDR (40Gb/s, about 30Gb/s effective) between all nodes. 2 HDD OSD nodes with 32GB RAM, fast enough CPU (E5-2620 v3), 2x 200GB DC S3610 for OS and journals (2 per SSD), 4x 1GB 2.5" SATAs for OSDs. For my amusement and edification the OSDs of one node are formatted with XFS, the other one EXT4 (as all my production clusters). The 2 SSD ODS nodes have 1x 200GB DC S3610 (OS and 4 journal partitions) and 2x 400GB DC S3610s (2 180GB partitions, so 8 SSD OSDs total), same specs as the HDD nodes otherwise. Also one node with XFS, the other EXT4. Pools are size=2, min_size=1, obviously. --- > > > > We have optimized the linux kernal(read_ahead,disk_scheduler,numa,swappiness) and ceph.conf(client_message,filestore_queue,journal_queue,rbd_cache).And checked the raid cache setting. > > > > > The only deficiency for the architecture is the unbalance weight between three racks which one rack has only one storage node. > > > > > So can anybody tell us whether this number is reasonable.If not,any suggestion to improve the number will be appreciated. > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com