Dan (who wrote that slide deck) is probably your best bet here, but I believe pool deletion is not very configurable and fairly expensive right now. I suspect that it will get better in Hammer or Infernalis, once we have a unified op work queue that we can independently prioritize all IO through (this was a blueprint in CDS today!). Similar problems with snap trimming and scrubbing were resolved by introducing sleeps between ops, but that's a bit of a hack itself and should be going away once proper IO prioritization is available. -Greg On Wed, Oct 29, 2014 at 8:19 AM, Daniel Schneller <daniel.schneller@xxxxxxxxxxxxxxxx> wrote: > Bump :-) > > Any ideas on this? They would be much appreciated. > > Also: Sorry for a possible double post, client had forgotten its email > config. > > On 2014-10-22 21:21:54 +0000, Daniel Schneller said: > >> We have been running several rounds of benchmarks through the Rados >> Gateway. Each run creates several hundred thousand objects and similarly >> many containers. >> >> The cluster consists of 4 machines, 12 OSD disks (spinning, 4TB) — 48 >> OSDs total. >> >> After running a set of benchmarks we renamed the pools used by the >> gateway pools to get a clean baseline. In total we now have several >> million objects and containers in 3 pools. Redundancy for all pools is >> set to 3. >> >> Today we started deleting the benchmark data. Once the first renamed set >> of RGW pools was executed, cluster performance started to go down the >> drain. Using iotop we can see that the disks are all working furiously. >> As the command to delete the pools came back very quickly, the >> assumption is that we are now seeing the effects of the actual objects >> being removed, causing lots and lots of IO activity on the disks, >> negatively impacting regular operations. >> >> We are running OpenStack on top of Ceph, and we see drastic reduction in >> responsiveness of these machines as well as in CephFS. >> >> Fortunately this is still a test setup, so no production systems are >> affected. Nevertheless I would like to ask a few questions: >> >> 1) Is it possible to have the object deletion run in some low-prio mode? >> 2) If not, is there another way to delete lots and lots of objects >> without affecting the rest of the cluster so badly? 3) Can we somehow >> determine the progress of the deletion so far? We would like to estimate >> if this is going to take hours, days or weeks? 4) Even if not possible >> for the already running deletion, could be get a progress for the >> remaining pools we still want to delete? 5) Are there any parameters >> that we might tune — even if just temporarily - to speed this up? >> >> Slide 18 of http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern >> describes a very similar situation. >> >> Thanks, Daniel >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com