The SSD's your using are rated for random writes of around 16,000.
After you remove the 3 way replication across your 12 disks you only actually have 4 physical disks worth of I/O, as the other disks would be also busy writing the 2nd and 3rd copy of the object (if this makes sense)
So with the overhead of also writing metdata and other I/O onto the SSD's and along with the network and general CEPH overhead id say 20,000 is pretty good.
If you wanted higher IOPS you would need to go for higher performance disks, but ideally more OSD's CEPH scales pretty well when you add more OSD's VS smaller but faster performance disks (from my experience and general reading from this list)
,Ash
On Wed, Feb 27, 2019 at 5:56 PM Weird Deviations <malblw05@xxxxxxxxx> wrote:
_______________________________________________Hello allI faced with poor performance on RBD imagesFirst, my lab's hardware consists of 3 intel server with- 2 intel xeon e5-2660 v4 (all powersaving stuff are turned off in BIOS) running on- S2600TPR MOBO- 256 Gb RAM- 4 Sata SSD intel 960 Gb model DC S3520 for OSD- 2 Sata SSD intel 480 Gb model DC S3520 for OS- 1 PCI-e NVMe intel 800 Gb model DC P3700 Series for writeback pool- dual port ixgbe 10Gb/s NICin eachAll this stuff running under CentOS 7.6 on kernel 4.14.15-1.el7.elrepo.x86_64Network interfaces run in teamingEach of these 3 servers act as mon-host, OSD-host, mgr-host and RBD-host:ceph -scluster:id: 6dc5b328-f8be-4c52-96b7-d20a1f78b067health: HEALTH_WARNFailed to send data to Zabbix1548 slow ops, oldest one blocked for 63205 sec, mon.alfa-csn-03 has slow opsservices:mon: 3 daemons, quorum alfa-csn-01,alfa-csn-02,alfa-csn-03mgr: alfa-csn-03(active), standbys: alfa-csn-02, alfa-csn-01osd: 27 osds: 27 up, 27 inrgw: 3 daemons activedata:pools: 8 pools, 2592 pgsobjects: 219.0 k objects, 810 GiBusage: 1.3 TiB used, 9.4 TiB / 11 TiB availpgs: 2592 active+cleanI created 2 OSD per SSD and using them to store data and 1 OSD on NVMe for write cacheAlso i created erasure profile:crush-device-class=crush-failure-domain=hostcrush-root=defaultk=2m=1plugin=isatechnique=reed_sol_vanand organized pool `vmstor' under this profile with 1024 pg and pgpHere is crush rule for `vmstor' pool:rule vmstor {id 1type erasuremin_size 3max_size 3step set_chooseleaf_tries 50step set_choose_tries 100step take datastep chooseleaf indep 0 type host-datastep emit}host-data alfa-csn-01-ssd {id -5 # do not change unnecessarilyid -6 class ssd # do not change unnecessarilyalg straw2hash 0 # rjenkins1item osd.0 weight 1.000item osd.1 weight 1.000item osd.2 weight 1.000item osd.3 weight 1.000item osd.4 weight 1.000item osd.5 weight 1.000item osd.6 weight 1.000item osd.7 weight 1.000}host-data alfa-csn-02-ssd {id -7 # do not change unnecessarilyid -8 class ssd # do not change unnecessarilyalg straw2hash 0 # rjenkins1item osd.8 weight 1.000item osd.9 weight 1.000item osd.10 weight 1.000item osd.11 weight 1.000item osd.12 weight 1.000item osd.13 weight 1.000item osd.14 weight 1.000item osd.15 weight 1.000}host-data alfa-csn-03-ssd {id -9 # do not change unnecessarilyid -10 class ssd # do not change unnecessarilyalg straw2hash 0 # rjenkins1item osd.16 weight 1.000item osd.17 weight 1.000item osd.18 weight 1.000item osd.19 weight 1.000item osd.20 weight 1.000item osd.21 weight 1.000item osd.22 weight 1.000item osd.23 weight 1.000}Also there was created pool named `wb-vmstor' with 256 pg and pgs as hot tier for `vmstor':rule wb-vmstor {id 4type replicatedmin_size 2max_size 3step take wbstep set_chooseleaf_tries 50step set_choose_tries 100step chooseleaf firstn 0 type host-wbstep emit}Then pool `vmstor' was inited as rbd pool, and a few images were created in itThese images was plugged as disks to 2 qemu-kvm virtual machines - 4 images per VM using native RBD support in QEMUQemu servers are running on the same (but separated) servers, i.e. xeon e5-2660v4, 256 ram and so onAnd then fio tests were performed on these disksResults:1) in case of using this virtual drive as raw block devices i got about 400 IOPS by 4kb or 8 kb (or another other size till 1Mb) blocks on random write2) after i created filesystems on these drives and mounted them in system and got about 20k IOPS.And it doesn't matter if i run test on single or both VMs - i have total 20k IOPS. I mean i run fio test on one VM and have 20k IOPS, then i run fio test on2 VMs and have 10k IOPS on each VMMy fio job is:[global]numjobs=1ioengine=libaiobuffered=0direct=1bs=8krw=randrwrwmixread=0iodepth=8group_reporting=1time_based=1[vdb]size=10Gdirectory=/mntfilename=vdb[vdc]size=10Gdirectory=/mnt1filename=vdc[vdd]size=10Gdirectory=/mnt2filename=vdd[vde]size=10Gdirectory=/mnt3filename=vde[vdf]size=10Gdirectory=/mnt4filename=vdfTo my mind that result is not so good and i guess this hardware and CEPH can produce much morePlease, help me find what i'm doing wrong
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com