Re: Delete pools with low priority?

Gregory Farnum <greg@xxxxxxxxxxx> · Wed, 29 Oct 2014 19:02:52 -0700



Dan (who wrote that slide deck) is probably your best bet here, but I
believe pool deletion is not very configurable and fairly expensive
right now. I suspect that it will get better in Hammer or Infernalis,
once we have a unified op work queue that we can independently
prioritize all IO through (this was a blueprint in CDS today!).
Similar problems with snap trimming and scrubbing were resolved by
introducing sleeps between ops, but that's a bit of a hack itself and
should be going away once proper IO prioritization is available.
-Greg

On Wed, Oct 29, 2014 at 8:19 AM, Daniel Schneller
<daniel.schneller@xxxxxxxxxxxxxxxx> wrote:
> Bump :-)
>
> Any ideas on this? They would be much appreciated.
>
> Also: Sorry for a possible double post, client had forgotten its email
> config.
>
> On 2014-10-22 21:21:54 +0000, Daniel Schneller said:
>
>> We have been running several rounds of benchmarks through the Rados
>> Gateway. Each run creates several hundred thousand objects and similarly
>> many containers.
>>
>> The cluster consists of 4 machines, 12 OSD disks (spinning, 4TB) — 48
>> OSDs total.
>>
>> After running a set of benchmarks we renamed the pools used by the
>> gateway pools to get a clean baseline. In total we now have several
>> million objects and containers in 3 pools. Redundancy for all pools is
>> set to 3.
>>
>> Today we started deleting the benchmark data. Once the first renamed set
>> of RGW pools was executed, cluster performance started to go down the
>> drain. Using iotop we can see that the disks are all working furiously.
>> As the command to delete the pools came back very quickly, the
>> assumption is that we are now seeing the effects of the actual objects
>> being removed, causing lots and lots of IO activity on the disks,
>> negatively impacting regular operations.
>>
>> We are running OpenStack on top of Ceph, and we see drastic reduction in
>> responsiveness of these machines as well as in CephFS.
>>
>> Fortunately this is still a test setup, so no production systems are
>> affected. Nevertheless I would like to ask a few questions:
>>
>> 1) Is it possible to have the object deletion run in some low-prio mode?
>> 2) If not, is there another way to delete lots and lots of objects
>> without affecting the rest of the cluster so badly? 3) Can we somehow
>> determine the progress of the deletion so far? We would like to estimate
>> if this is going to take hours, days or weeks? 4) Even if not possible
>> for the already running deletion, could be get a progress for the
>> remaining pools we still want to delete? 5) Are there any parameters
>> that we might tune — even if just temporarily - to speed this up?
>>
>> Slide 18 of http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern
>> describes a very similar situation.
>>
>> Thanks, Daniel
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com