Hi David,
That is very helpful thank you. When looking at the graphs I notice that
the bandwidth used looks as if this is very low. Or am I misinterpreting
the bandwidth graphs?
Regards
Marcel
David Caro schreef op 2021-06-10 11:49:
We have a similar setup, way smaller though (~120 osds right now) :)
We have different capped VMs, but most have 500 write, 1000 read iops
cap, you can see it in effect here:
https://cloud-ceph-performance-tests.toolforge.org/
We are currently running Octopus v15.2.11.
It's a very 'bare' ui (under construction), but check the
'after_ceph_upgrade_v2' for example, the 'vm_disk' suite, the
'RunConfig(rw=randread, bs=4096, ioengine=libaio, iodepth=1)' or
'RunConfig(rw=randwrite, bs=4096, ioengine=libaio, iodepth=1)' tests
that hit the cap.
From there you can also see the numbers of the tests running uncapped
(in the 'rbd_from_hypervisor' or 'rbd_from_osd'
suites).
You can see the current iops of our ceph cluster here:
https://grafana.wikimedia.org/d/7TjJENEWz/wmcs-ceph-eqiad-cluster-overview?orgId=1
Of our openstack setup:
https://grafana.wikimedia.org/d/000000579/wmcs-openstack-eqiad1?orgId=1&refresh=15m
And some details on the traffic openstck puts on each ceph osd host
here:
https://grafana.wikimedia.org/d/wsoKtElZk/wmcs-ceph-eqiad-network-utilization?orgId=1&refresh=5m
We are working on revamping those graphs right now, so it might become
easier to see numbers in a few weeks.
We don't usually see slow ops with the current load, though we
recommend not using ceph for very latency sensitive VMs
(like etcd), as on the network layer there's some hardware limits we
can't remove right now.
Hope that helps.
On 06/10 10:54, Marcel Kuiper wrote:
Hi
We're running ceph nautilus 14.2.21 (going to octopus latest in a few
weeks)
as volume and instance backend for our openstack vm's. Our clusters
run
somewhere between 500 - 1000 OSDs on SAS HDDs with NVMe's as journal
and db
device
Currently we do not have our vm's capped on iops and throughput. We
regularly get slowops warnings (once or twice per day) and wonder
whether
there are more users with sort of the same setup that do throttle
their
openstack vm's.
- What kind of numbers are used in the field for IOPS and throughput
limiting?
- As a side question, is there an easy way to get rid of the slowops
warning
besides restarting the involved osd. Otherwise the warning seems to
stay
forever
Regards
Marcel
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx