Re: Rados performance inconsistencies, lower than expected performance

Menno Zonneveld <menno@xxxxxxxx> · Mon, 10 Sep 2018 11:44:56 +0200

I filled up the cluster by accident by not supplying --no-cleanup to the write benchmark, I'm sure there must be a better way for that though.

I've run the tests again and when the cluster is 'empty' (I have a few test VM's stored on CEPH) and let it fill up again.

Performance goes up from 276.812 to 433.859 MB/sec and latency goes down from 0.231178 to 0.147433.

I do have to mention I did find a problem with the cluster thanks to Alwin's suggestion to (re)do fio benchmarks, one server with identical SSD's is performing poorly compared to the others, I'll resolve this first before continuing other benchmarks.

When empty:

# ceph df

GLOBAL:
    SIZE      AVAIL     RAW USED     %RAW USED 
    3784G     2488G        1295G         34.24 
POOLS:
    NAME         ID     USED     %USED     MAX AVAIL     OBJECTS 
    ssd          1      431G     37.33          723G      110984 
    rbdbench     76        0         0          723G           0 

# rados bench -p rbdbench 180 write -b 4M -t 16 --no-cleanup

Total time run:         180.223580
Total writes made:      12472
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     276.812
Stddev Bandwidth:       66.2295
Max bandwidth (MB/sec): 524
Min bandwidth (MB/sec): 112
Average IOPS:           69
Stddev IOPS:            16
Max IOPS:               131
Min IOPS:               28
Average Latency(s):     0.231178
Stddev Latency(s):      0.19153
Max latency(s):         1.16432
Min latency(s):         0.022585

And after a few benchmarks when I hit CEPH's warning near-full.:

# ceph df

GLOBAL:
    SIZE      AVAIL     RAW USED     %RAW USED 
    3784G      751G        3032G         80.13 
POOLS:
    NAME         ID     USED     %USED     MAX AVAIL     OBJECTS 
    ssd          1      431G     82.93        90858M      110984 
    rbdbench     76     579G     86.73        90858M      148467 

# rados bench -p rbdbench 180 write -b 4M -t 16 --no-cleanup

Total time run:         180.233495
Total writes made:      19549
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     433.859
Stddev Bandwidth:       73.0601
Max bandwidth (MB/sec): 584
Min bandwidth (MB/sec): 220
Average IOPS:           108
Stddev IOPS:            18
Max IOPS:               146
Min IOPS:               55
Average Latency(s):     0.147433
Stddev Latency(s):      0.103518
Max latency(s):         1.08162
Min latency(s):         0.0218688

-----Original message-----
> From:Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
> Sent: Thursday 6th September 2018 17:15
> To: ceph-users <ceph-users@xxxxxxxxxxxxxx>; Menno Zonneveld <menno@xxxxxxxx>
> Subject: RE:  Rados performance inconsistencies, lower than expected performance
> 
> 
> It is idle, testing still, running a backup's at night on it.
> How do you fill up the cluster so you can test between empty and full? 
> Do you have a "ceph df" from empty and full? 
> 
> I have done another test disabling new scrubs on the rbd.ssd pool (but 
> still 3 on hdd) with:
> ceph tell osd.* injectargs --osd_max_backfills=0
> Again getting slower towards the end.
> Bandwidth (MB/sec):     395.749
> Average Latency(s):     0.161713
> 
> 
> -----Original Message-----
> From: Menno Zonneveld [mailto:menno@xxxxxxxx] 
> Sent: donderdag 6 september 2018 16:56
> To: Marc Roos; ceph-users
> Subject: RE:  Rados performance inconsistencies, lower than 
> expected performance
> 
> The benchmark does fluctuate quite a bit that's why I run it for 180 
> seconds now as then I do get consistent results.
> 
> Your performance seems on par with what I'm getting with 3 nodes and 9 
> OSD's, not sure what to make of that.
> 
> Are your machines actively used perhaps? Mine are mostly idle as it's 
> still a test setup.
> 
> -----Original message-----
> > From:Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
> > Sent: Thursday 6th September 2018 16:23
> > To: ceph-users <ceph-users@xxxxxxxxxxxxxx>; Menno Zonneveld 
> > <menno@xxxxxxxx>
> > Subject: RE:  Rados performance inconsistencies, lower 
> > than expected performance
> > 
> > 
> > 
> > I am on 4 nodes, mostly hdds, and 4x samsung sm863 480GB 2x E5-2660 2x 
> 
> > LSI SAS2308 1x dual port 10Gbit (one used, and shared between 
> > cluster/client vlans)
> > 
> > I have 5 pg's scrubbing, but I am not sure if there is any on the ssd 
> > pool. I am noticing a drop in the performance at the end of the test.
> > Maybe some caching on the ssd?
> > 
> > rados bench -p rbd.ssd 60 write -b 4M -t 16
> > Bandwidth (MB/sec):     448.465
> > Average Latency(s):     0.142671
> > 
> > rados bench -p rbd.ssd 180 write -b 4M -t 16
> > Bandwidth (MB/sec):     381.998
> > Average Latency(s):     0.167524
> > 
> > 
> > -----Original Message-----
> > From: Menno Zonneveld [mailto:menno@xxxxxxxx]
> > Sent: donderdag 6 september 2018 15:52
> > To: Marc Roos; ceph-users
> > Subject: RE:  Rados performance inconsistencies, lower 
> > than expected performance
> > 
> > ah yes, 3x replicated with minimal 2.
> > 
> > 
> > my ceph.conf is pretty bare, just in case it might be relevant
> > 
> > [global]
> > auth client required = cephx
> > auth cluster required = cephx
> > auth service required = cephx
> > 
> > cluster network = 172.25.42.0/24
> > 
> > fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e
> > 
> > keyring = /etc/pve/priv/$cluster.$name.keyring
> > 
> > mon allow pool delete = true
> > mon osd allow primary affinity = true
> > 
> > osd journal size = 5120
> > osd pool default min size = 2
> > osd pool default size = 3
> > 
> > 
> > -----Original message-----
> > > From:Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
> > > Sent: Thursday 6th September 2018 15:43
> > > To: ceph-users <ceph-users@xxxxxxxxxxxxxx>; Menno Zonneveld 
> > > <menno@xxxxxxxx>
> > > Subject: RE:  Rados performance inconsistencies, lower 
> > > than expected performance
> > > 
> > >  
> > > 
> > > Test pool is 3x replicated?
> > > 
> > > 
> > > -----Original Message-----
> > > From: Menno Zonneveld [mailto:menno@xxxxxxxx]
> > > Sent: donderdag 6 september 2018 15:29
> > > To: ceph-users@xxxxxxxxxxxxxx
> > > Subject:  Rados performance inconsistencies, lower than 
> > > expected performance
> > > 
> > > I've setup a CEPH cluster to test things before going into 
> > > production but I've run into some performance issues that I cannot 
> > > resolve or explain.
> > > 
> > > Hardware in use in each storage machine (x3)
> > > - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 
> > > 9000)
> > > - dual 10Gbit EdgeSwitch 16-Port XG
> > > - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA
> > > - 3x Intel S4500 480GB SSD as OSD's
> > > - 2x SSD raid-1 boot/OS disks
> > > - 2x Intel(R) Xeon(R) CPU E5-2630
> > > - 128GB memory
> > > 
> > > Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 
> 
> > > on all nodes.
> > > 
> > > Running rados benchmark resulted in somewhat lower than expected 
> > > performance unless ceph enters the 'near-full' state. When the 
> > > cluster
> > 
> > > is mostly empty rados bench (180 write -b 4M -t 16) results in about 
> 
> > > 330MB/s with 0.18ms latency but when hitting near-full state this 
> > > goes
> > 
> > > up to a more expected 550MB/s and 0.11ms latency.
> > > 
> > > iostat on the storage machines shows the disks are hardly utilized 
> > > unless the cluster hits near-full, CPU and network also aren't maxed 
> 
> > > out. I’ve also tried with NIC bonding and just one switch, without 
> > > jumbo frames but nothing seem to matter in this case.
> > > 
> > > Is this expected behavior or what can I try to do to pinpoint the 
> > > bottleneck ?
> > > 
> > > The expected performance is per Proxmox's benchmark results they 
> > > released this year, they have 4 OSD's per server and hit almost 
> > > 800MB/s with 0.08ms latency using 10Gbit and 3 nodes, though they 
> > > have
> > 
> > > more OSD's and somewhat different hardware I understand I won't hit 
> > > the 800MB/s mark but the difference between empty and almost full 
> > > cluster makes no sense to me, I'd expect it to be the other way
> > around.
> > > 
> > > Thanks,
> > > Menno
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > > 
> > > 
> > 
> > 
> > 
> 
> 
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com