Re: ceph and openstack throttling experience

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi David,

That is very helpful thank you. When looking at the graphs I notice that the bandwidth used looks as if this is very low. Or am I misinterpreting the bandwidth graphs?

Regards

Marcel

David Caro schreef op 2021-06-10 11:49:
We have a similar setup, way smaller though (~120 osds right now) :)

We have different capped VMs, but most have 500 write, 1000 read iops
cap, you can see it in effect here:
https://cloud-ceph-performance-tests.toolforge.org/

We are currently running Octopus v15.2.11.

It's a very 'bare' ui (under construction), but check the
'after_ceph_upgrade_v2' for example, the 'vm_disk' suite, the
'RunConfig(rw=randread, bs=4096, ioengine=libaio, iodepth=1)' or
'RunConfig(rw=randwrite, bs=4096, ioengine=libaio, iodepth=1)' tests
that hit the cap.

From there you can also see the numbers of the tests running uncapped
(in the 'rbd_from_hypervisor' or 'rbd_from_osd'
suites).

You can see the current iops of our ceph cluster here:
https://grafana.wikimedia.org/d/7TjJENEWz/wmcs-ceph-eqiad-cluster-overview?orgId=1

Of our openstack setup:
https://grafana.wikimedia.org/d/000000579/wmcs-openstack-eqiad1?orgId=1&refresh=15m

And some details on the traffic openstck puts on each ceph osd host here:
https://grafana.wikimedia.org/d/wsoKtElZk/wmcs-ceph-eqiad-network-utilization?orgId=1&refresh=5m

We are working on revamping those graphs right now, so it might become
easier to see numbers in a few weeks.


We don't usually see slow ops with the current load, though we
recommend not using ceph for very latency sensitive VMs
(like etcd), as on the network layer there's some hardware limits we
can't remove right now.

Hope that helps.

On 06/10 10:54, Marcel Kuiper wrote:
Hi

We're running ceph nautilus 14.2.21 (going to octopus latest in a few weeks) as volume and instance backend for our openstack vm's. Our clusters run somewhere between 500 - 1000 OSDs on SAS HDDs with NVMe's as journal and db
device

Currently we do not have our vm's capped on iops and throughput. We
regularly get slowops warnings (once or twice per day) and wonder whether there are more users with sort of the same setup that do throttle their
openstack vm's.

- What kind of numbers are used in the field for IOPS and throughput
limiting?

- As a side question, is there an easy way to get rid of the slowops warning besides restarting the involved osd. Otherwise the warning seems to stay
forever

Regards

Marcel
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux