Fwd: [MGR] Only 60 trash removal tasks are processed per minute

sea you <seayou@xxxxxxxxx> · Tue, 6 Dec 2022 10:22:24 +0100

Hi all,

Our cluster contains 12 nodes, 120 OSDs  (all NVME), and - currently -
4096 PGs in total. We're currently testing a scenario of having 20
thousand - 10G - volumes and then taking snapshots of each one of
them. These 20k snapshots are created in just a bit under 2 hours.

When we delete one snapshot of each volume - so again 20k -, it
usually takes more than 2 hours to move them to trash and create tasks
to delete.

Now the tasks to remove them from the trash are pretty slow. According
to my calculations, it's around 1 removal in 1 second. Doing the math,
it's around 5 and a half hours to empty the trash at this pace...

Looking at the https://github.com/ceph/ceph/blob/main/src/pybind/mgr/rbd_support/task.py
module, it's clear that this is a sequential operation, but is there
anything we could do to improve the speed here?

Neither the MGR nor any other components are CPU/memory bound, ceph is
basically just chilling :)

Any thoughts?

Doma
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx