Yep, 0.60 do all snapshot-related things way faster and it is obviously faster on r/w with small blocks - comparing to 0.56.4 on same disk commit percentage I may say ten times faster on average request in-flight time. On Mon, Apr 22, 2013 at 7:37 PM, Andrey Korolyov <andrey@xxxxxxx> wrote: > Mentioned cluster is in production, but I can compare number of slow > reqs on the test with different versions and report it. > > On Mon, Apr 22, 2013 at 7:24 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: >> On Mon, 22 Apr 2013, Andrey Korolyov wrote: >>> On Mon, Apr 22, 2013 at 7:10 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: >>> > What version of Ceph are you running? >>> > >>> > sage >>> >>> 0.56.4 with couple of backports from bobtail, but I`m not sure if >>> version matters - the same behaviour was around from early 0.5x >>> releases. >> >> I ask because the snapshot trimming was completely rewritten in 0.58 or >> 0.59. Is possible to test the latest on this cluster, or is it in >> production? >> >> sage >> >>> >>> > >>> > On Mon, 22 Apr 2013, Andrey Korolyov wrote: >>> > >>> >> I had observed that the slow requests up to 10-20 seconds on writes >>> >> may be produced immediately after creation or deletion of a snapshot >>> >> of relatively large image, despite that image may be entirely unused >>> >> at the moment. >>> >> >>> >> On Sun, Apr 21, 2013 at 7:44 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >>> >> > Which version of Ceph are you running right now and seeing this with >>> >> > (Sam reworked it a bit for Cuttlefish and it was in some of the dev >>> >> > releases)? Snapshot deletes are a little more expensive than we'd >>> >> > like, but I'm surprised they're doing this badly for you. :/ >>> >> > -Greg >>> >> > Software Engineer #42 @ http://inktank.com | http://ceph.com >>> >> > >>> >> > On Sun, Apr 21, 2013 at 2:16 AM, Olivier Bonvalet >>> >> > <olivier.bonvalet@xxxxxxxxx> wrote: >>> >> >> Hi, >>> >> >> >>> >> >> I have a backup script, which every night : >>> >> >> * create a snapshot of each RBD image >>> >> >> * then delete all snapshot that have more than 15 days >>> >> >> >>> >> >> The problem is that "rbd snap rm XXX" will overload my cluster for hours >>> >> >> (6 hours today...). >>> >> >> >>> >> >> Here I see several problems : >>> >> >> #1 "rbd snap rm XXX" is not blocking. The erase is done in background, >>> >> >> and I know no way to verify if it was completed. So I add "sleeps" >>> >> >> between rm, but I have to estimate the time it will take >>> >> >> >>> >> >> #2 "rbd (snap) rm" are sometimes very very slow. I don't know if it's >>> >> >> because of XFS or not, but all my OSD are at 100% IO usage (reported by >>> >> >> iostat) >>> >> >> >>> >> >> >>> >> >> >>> >> >> So : >>> >> >> * is there a way to reduce priority of "snap rm", to avoid overloading >>> >> >> of the cluster ? >>> >> >> * is there a way to have a blocking "snap rm" which will wait until it's >>> >> >> completed >>> >> >> * is there a way to speedup "snap rm" ? >>> >> >> >>> >> >> >>> >> >> Note that I have a too low PG number on my cluster (200 PG for 40 active >>> >> >> OSD ; but I'm trying to progressivly migrate data to a newer pool). Can >>> >> >> it be the source of the problem ? >>> >> >> >>> >> >> Thanks, >>> >> >> >>> >> >> Olivier >>> >> >> >>> >> >> _______________________________________________ >>> >> >> ceph-users mailing list >>> >> >> ceph-users@xxxxxxxxxxxxxx >>> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> > _______________________________________________ >>> >> > ceph-users mailing list >>> >> > ceph-users@xxxxxxxxxxxxxx >>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> _______________________________________________ >>> >> ceph-users mailing list >>> >> ceph-users@xxxxxxxxxxxxxx >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >>> >> >>> >>> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com