On 06/10 14:05, Marcel Kuiper wrote: > Hi David, > > That is very helpful thank you. When looking at the graphs I notice that the > bandwidth used looks as if this is very low. Or am I misinterpreting the > bandwidth graphs? Hey, sorry for the delay, something broke :) What graphs in specific are you looking? > > Regards > > Marcel > > David Caro schreef op 2021-06-10 11:49: > > We have a similar setup, way smaller though (~120 osds right now) :) > > > > We have different capped VMs, but most have 500 write, 1000 read iops > > cap, you can see it in effect here: > > https://cloud-ceph-performance-tests.toolforge.org/ > > > > We are currently running Octopus v15.2.11. > > > > It's a very 'bare' ui (under construction), but check the > > 'after_ceph_upgrade_v2' for example, the 'vm_disk' suite, the > > 'RunConfig(rw=randread, bs=4096, ioengine=libaio, iodepth=1)' or > > 'RunConfig(rw=randwrite, bs=4096, ioengine=libaio, iodepth=1)' tests > > that hit the cap. > > > > From there you can also see the numbers of the tests running uncapped > > (in the 'rbd_from_hypervisor' or 'rbd_from_osd' > > suites). > > > > You can see the current iops of our ceph cluster here: > > https://grafana.wikimedia.org/d/7TjJENEWz/wmcs-ceph-eqiad-cluster-overview?orgId=1 > > > > Of our openstack setup: > > https://grafana.wikimedia.org/d/000000579/wmcs-openstack-eqiad1?orgId=1&refresh=15m > > > > And some details on the traffic openstck puts on each ceph osd host > > here: > > https://grafana.wikimedia.org/d/wsoKtElZk/wmcs-ceph-eqiad-network-utilization?orgId=1&refresh=5m > > > > We are working on revamping those graphs right now, so it might become > > easier to see numbers in a few weeks. > > > > > > We don't usually see slow ops with the current load, though we > > recommend not using ceph for very latency sensitive VMs > > (like etcd), as on the network layer there's some hardware limits we > > can't remove right now. > > > > Hope that helps. > > > > On 06/10 10:54, Marcel Kuiper wrote: > > > Hi > > > > > > We're running ceph nautilus 14.2.21 (going to octopus latest in a > > > few weeks) > > > as volume and instance backend for our openstack vm's. Our clusters > > > run > > > somewhere between 500 - 1000 OSDs on SAS HDDs with NVMe's as journal > > > and db > > > device > > > > > > Currently we do not have our vm's capped on iops and throughput. We > > > regularly get slowops warnings (once or twice per day) and wonder > > > whether > > > there are more users with sort of the same setup that do throttle > > > their > > > openstack vm's. > > > > > > - What kind of numbers are used in the field for IOPS and throughput > > > limiting? > > > > > > - As a side question, is there an easy way to get rid of the slowops > > > warning > > > besides restarting the involved osd. Otherwise the warning seems to > > > stay > > > forever > > > > > > Regards > > > > > > Marcel > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- David Caro SRE - Cloud Services Wikimedia Foundation <https://wikimediafoundation.org/> PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3 "Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment."
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx