We have been running several rounds of benchmarks through the Rados Gateway. Each run creates several hundred thousand objects and similarly many containers. The cluster consists of 4 machines, 12 OSD disks (spinning, 4TB) — 48 OSDs total. After running a set of benchmarks we renamed the pools used by the gateway pools to get a clean baseline. In total we now have several million objects and containers in 3 pools. Redundancy for all pools is set to 3. Today we started deleting the benchmark data. Once the first renamed set of RGW pools was executed, cluster performance started to go down the drain. Using iotop we can see that the disks are all working furiously. As the command to delete the pools came back very quickly, the assumption is that we are now seeing the effects of the actual objects being removed, causing lots and lots of IO activity on the disks, negatively impacting regular operations. We are running OpenStack on top of Ceph, and we see drastic reduction in responsiveness of these machines as well as in CephFS. Fortunately this is still a test setup, so no production systems are affected. Nevertheless I would like to ask a few questions: 1) Is it possible to have the object deletion run in some low-prio mode? 2) If not, is there another way to delete lots and lots of objects without affecting the rest of the cluster so badly? 3) Can we somehow determine the progress of the deletion so far? We would like to estimate if this is going to take hours, days or weeks? 4) Even if not possible for the already running deletion, could be get a progress for the remaining pools we still want to delete? 5) Are there any parameters that we might tune — even if just temporarily - to speed this up? Slide 18 of http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern describes a very similar situation. Thanks, Daniel _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com