Re: "rbd snap rm" overload my cluster (during backups)

Andrey Korolyov <andrey@xxxxxxx> · Mon, 22 Apr 2013 19:20:03 +0400

On Mon, Apr 22, 2013 at 7:10 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
> What version of Ceph are you running?
>
> sage

0.56.4 with couple of backports from bobtail, but I`m not sure if
version matters - the same behaviour was around from early 0.5x
releases.

>
> On Mon, 22 Apr 2013, Andrey Korolyov wrote:
>
>> I had  observed that the slow requests up to 10-20 seconds on writes
>> may be produced immediately after creation or deletion of a snapshot
>> of relatively large image, despite that image may be entirely unused
>> at the moment.
>>
>> On Sun, Apr 21, 2013 at 7:44 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>> > Which version of Ceph are you running right now and seeing this with
>> > (Sam reworked it a bit for Cuttlefish and it was in some of the dev
>> > releases)? Snapshot deletes are a little more expensive than we'd
>> > like, but I'm surprised they're doing this badly for you. :/
>> > -Greg
>> > Software Engineer #42 @ http://inktank.com | http://ceph.com
>> >
>> > On Sun, Apr 21, 2013 at 2:16 AM, Olivier Bonvalet
>> > <olivier.bonvalet@xxxxxxxxx> wrote:
>> >> Hi,
>> >>
>> >> I have a backup script, which every night :
>> >> * create a snapshot of each RBD image
>> >> * then delete all snapshot that have more than 15 days
>> >>
>> >> The problem is that "rbd snap rm XXX" will overload my cluster for hours
>> >> (6 hours today...).
>> >>
>> >> Here I see several problems :
>> >> #1 "rbd snap rm XXX" is not blocking. The erase is done in background,
>> >> and I know no way to verify if it was completed. So I add "sleeps"
>> >> between rm, but I have to estimate the time it will take
>> >>
>> >> #2 "rbd (snap) rm" are sometimes very very slow. I don't know if it's
>> >> because of XFS or not, but all my OSD are at 100% IO usage (reported by
>> >> iostat)
>> >>
>> >>
>> >>
>> >> So :
>> >> * is there a way to reduce priority of "snap rm", to avoid overloading
>> >> of the cluster ?
>> >> * is there a way to have a blocking "snap rm" which will wait until it's
>> >> completed
>> >> * is there a way to speedup "snap rm" ?
>> >>
>> >>
>> >> Note that I have a too low PG number on my cluster (200 PG for 40 active
>> >> OSD ; but I'm trying to progressivly migrate data to a newer pool). Can
>> >> it be the source of the problem ?
>> >>
>> >> Thanks,
>> >>
>> >> Olivier
>> >>
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users@xxxxxxxxxxxxxx
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com