Mentioned cluster is in production, but I can compare number of slow reqs on the test with different versions and report it. On Mon, Apr 22, 2013 at 7:24 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: > On Mon, 22 Apr 2013, Andrey Korolyov wrote: >> On Mon, Apr 22, 2013 at 7:10 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: >> > What version of Ceph are you running? >> > >> > sage >> >> 0.56.4 with couple of backports from bobtail, but I`m not sure if >> version matters - the same behaviour was around from early 0.5x >> releases. > > I ask because the snapshot trimming was completely rewritten in 0.58 or > 0.59. Is possible to test the latest on this cluster, or is it in > production? > > sage > >> >> > >> > On Mon, 22 Apr 2013, Andrey Korolyov wrote: >> > >> >> I had observed that the slow requests up to 10-20 seconds on writes >> >> may be produced immediately after creation or deletion of a snapshot >> >> of relatively large image, despite that image may be entirely unused >> >> at the moment. >> >> >> >> On Sun, Apr 21, 2013 at 7:44 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> >> > Which version of Ceph are you running right now and seeing this with >> >> > (Sam reworked it a bit for Cuttlefish and it was in some of the dev >> >> > releases)? Snapshot deletes are a little more expensive than we'd >> >> > like, but I'm surprised they're doing this badly for you. :/ >> >> > -Greg >> >> > Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> > >> >> > On Sun, Apr 21, 2013 at 2:16 AM, Olivier Bonvalet >> >> > <olivier.bonvalet@xxxxxxxxx> wrote: >> >> >> Hi, >> >> >> >> >> >> I have a backup script, which every night : >> >> >> * create a snapshot of each RBD image >> >> >> * then delete all snapshot that have more than 15 days >> >> >> >> >> >> The problem is that "rbd snap rm XXX" will overload my cluster for hours >> >> >> (6 hours today...). >> >> >> >> >> >> Here I see several problems : >> >> >> #1 "rbd snap rm XXX" is not blocking. The erase is done in background, >> >> >> and I know no way to verify if it was completed. So I add "sleeps" >> >> >> between rm, but I have to estimate the time it will take >> >> >> >> >> >> #2 "rbd (snap) rm" are sometimes very very slow. I don't know if it's >> >> >> because of XFS or not, but all my OSD are at 100% IO usage (reported by >> >> >> iostat) >> >> >> >> >> >> >> >> >> >> >> >> So : >> >> >> * is there a way to reduce priority of "snap rm", to avoid overloading >> >> >> of the cluster ? >> >> >> * is there a way to have a blocking "snap rm" which will wait until it's >> >> >> completed >> >> >> * is there a way to speedup "snap rm" ? >> >> >> >> >> >> >> >> >> Note that I have a too low PG number on my cluster (200 PG for 40 active >> >> >> OSD ; but I'm trying to progressivly migrate data to a newer pool). Can >> >> >> it be the source of the problem ? >> >> >> >> >> >> Thanks, >> >> >> >> >> >> Olivier >> >> >> >> >> >> _______________________________________________ >> >> >> ceph-users mailing list >> >> >> ceph-users@xxxxxxxxxxxxxx >> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > _______________________________________________ >> >> > ceph-users mailing list >> >> > ceph-users@xxxxxxxxxxxxxx >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> _______________________________________________ >> >> ceph-users mailing list >> >> ceph-users@xxxxxxxxxxxxxx >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> >> >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com