On 01/07/2015 05:51 PM, Dan van der Ster wrote: > Hi Wido, > I've been trying to reproduce this but haven't been able yet. > > What I've tried so far is use fio rbd with a 0.80.7 client connected > to a 0.80.7 cluster. I created a 10GB format 2 block device, then > measured the 4k randwrite iops before and after having snaps. I > measured around 2000 iops to the image before any snapshots, then > created 200 snapshots on the device and ran fio again. Initially the > iops were low (I guess this is from the 4MB CoW resulting from the > first 4k write to each underlying object). But eventually the speed > stabilized to around 2000 iops again. Actually the initial slowdown > was the same whether I created 1 snapshot or 200. > > This was just quick subjective test so far, since from your report I > was expecting something obvious to stick out. But it appears pretty > OK, no? Would you have expected something different from these tests? > Well, I'm not sure what to expect. But what I noticed is that when I removed all the snapshots the slow requests were gone and the disk util dropped on the OSDs. Wido > Cheers, Dan > > > On Wed, Dec 31, 2014 at 5:21 PM, Wido den Hollander <wido@xxxxxxxx> wrote: >> Hi, >> >> Last week I upgraded a 250 OSD cluster from Dumpling 0.67.10 to Firefly >> 0.80.7 and after the upgrade there was a severe performance drop on the >> cluster. >> >> It started raining slow requests after the upgrade and most of them >> included a 'snapc' in the request. >> >> That lead me to investigate the RBD snapshots and I found that a rogue >> process had created ~1800 snapshots spread out over 200 volumes. >> >> One image even had 181 snapshots! >> >> As the snapshots weren't used I removed them all and after the snapshots >> were removed the performance of the cluster came back to normal level again. >> >> I'm wondering what changed between Dumpling and Firefly which caused >> this? I saw OSDs spiking to 100% disk util constantly under Firefly >> where this didn't happen with Dumpling. >> >> Did something change in the way OSDs handle RBD snapshots which causes >> them to create more disk I/O? >> >> -- >> Wido den Hollander >> 42on B.V. >> Ceph trainer and consultant >> >> Phone: +31 (0)20 700 9902 >> Skype: contact42on >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html