Re: "rbd snap rm" overload my cluster (during backups)

Gregory Farnum <greg@xxxxxxxxxxx> · Sun, 21 Apr 2013 08:44:27 -0700



Which version of Ceph are you running right now and seeing this with
(Sam reworked it a bit for Cuttlefish and it was in some of the dev
releases)? Snapshot deletes are a little more expensive than we'd
like, but I'm surprised they're doing this badly for you. :/
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

On Sun, Apr 21, 2013 at 2:16 AM, Olivier Bonvalet
<olivier.bonvalet@xxxxxxxxx> wrote:
> Hi,
>
> I have a backup script, which every night :
> * create a snapshot of each RBD image
> * then delete all snapshot that have more than 15 days
>
> The problem is that "rbd snap rm XXX" will overload my cluster for hours
> (6 hours today...).
>
> Here I see several problems :
> #1 "rbd snap rm XXX" is not blocking. The erase is done in background,
> and I know no way to verify if it was completed. So I add "sleeps"
> between rm, but I have to estimate the time it will take
>
> #2 "rbd (snap) rm" are sometimes very very slow. I don't know if it's
> because of XFS or not, but all my OSD are at 100% IO usage (reported by
> iostat)
>
>
>
> So :
> * is there a way to reduce priority of "snap rm", to avoid overloading
> of the cluster ?
> * is there a way to have a blocking "snap rm" which will wait until it's
> completed
> * is there a way to speedup "snap rm" ?
>
>
> Note that I have a too low PG number on my cluster (200 PG for 40 active
> OSD ; but I'm trying to progressivly migrate data to a newer pool). Can
> it be the source of the problem ?
>
> Thanks,
>
> Olivier
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com