Re: ceph and openstack throttling experience

David Caro <dcaro@xxxxxxxxxxxxx> · Thu, 10 Jun 2021 18:15:01 +0200

On 06/10 14:05, Marcel Kuiper wrote:
> Hi David,
> 
> That is very helpful thank you. When looking at the graphs I notice that the
> bandwidth used looks as if this is very low. Or am I misinterpreting the
> bandwidth graphs?

Hey, sorry for the delay, something broke :)

What graphs in specific are you looking?

> 
> Regards
> 
> Marcel
> 
> David Caro schreef op 2021-06-10 11:49:
> > We have a similar setup, way smaller though (~120 osds right now) :)
> > 
> > We have different capped VMs, but most have 500 write, 1000 read iops
> > cap, you can see it in effect here:
> > https://cloud-ceph-performance-tests.toolforge.org/
> > 
> > We are currently running Octopus v15.2.11.
> > 
> > It's a very 'bare' ui (under construction), but check the
> > 'after_ceph_upgrade_v2' for example, the 'vm_disk' suite, the
> > 'RunConfig(rw=randread, bs=4096, ioengine=libaio, iodepth=1)' or
> > 'RunConfig(rw=randwrite, bs=4096, ioengine=libaio, iodepth=1)' tests
> > that hit the cap.
> > 
> > From there you can also see the numbers of the tests running uncapped
> > (in the 'rbd_from_hypervisor' or 'rbd_from_osd'
> > suites).
> > 
> > You can see the current iops of our ceph cluster here:
> > https://grafana.wikimedia.org/d/7TjJENEWz/wmcs-ceph-eqiad-cluster-overview?orgId=1
> > 
> > Of our openstack setup:
> > https://grafana.wikimedia.org/d/000000579/wmcs-openstack-eqiad1?orgId=1&refresh=15m
> > 
> > And some details on the traffic openstck puts on each ceph osd host
> > here:
> > https://grafana.wikimedia.org/d/wsoKtElZk/wmcs-ceph-eqiad-network-utilization?orgId=1&refresh=5m
> > 
> > We are working on revamping those graphs right now, so it might become
> > easier to see numbers in a few weeks.
> > 
> > 
> > We don't usually see slow ops with the current load, though we
> > recommend not using ceph for very latency sensitive VMs
> > (like etcd), as on the network layer there's some hardware limits we
> > can't remove right now.
> > 
> > Hope that helps.
> > 
> > On 06/10 10:54, Marcel Kuiper wrote:
> > > Hi
> > > 
> > > We're running ceph nautilus 14.2.21 (going to octopus latest in a
> > > few weeks)
> > > as volume and instance backend for our openstack vm's. Our clusters
> > > run
> > > somewhere between 500 - 1000 OSDs on SAS HDDs with NVMe's as journal
> > > and db
> > > device
> > > 
> > > Currently we do not have our vm's capped on iops and throughput. We
> > > regularly get slowops warnings (once or twice per day) and wonder
> > > whether
> > > there are more users with sort of the same setup that do throttle
> > > their
> > > openstack vm's.
> > > 
> > > - What kind of numbers are used in the field for IOPS and throughput
> > > limiting?
> > > 
> > > - As a side question, is there an easy way to get rid of the slowops
> > > warning
> > > besides restarting the involved osd. Otherwise the warning seems to
> > > stay
> > > forever
> > > 
> > > Regards
> > > 
> > > Marcel
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."
Attachment:
signature.asc

Description: PGP signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx