On Tue, Apr 18, 2017 at 11:34 AM, Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx> wrote: > On 04/18/17 11:44, Jogi Hofmüller wrote: > > Hi, > > Am Dienstag, den 18.04.2017, 13:02 +0200 schrieb mj: > > On 04/18/2017 11:24 AM, Jogi Hofmüller wrote: > > This might have been true for hammer and older versions of ceph. > From > what I can tell now, every snapshot taken reduces performance of > the > entire cluster :( > > Really? Can others confirm this? Is this a 'wellknown fact'? > (unknown only to us, perhaps...) > > I have to add some more/new details now. We started removing snapshots > for VMs today. We did this VM for VM and waited some time in between > while monitoring the cluster. > > After having removed all snapshots for the third VM the cluster went > back to a 'normal' state again: no more slow requests. i/o waits for > VMs are down to acceptable numbers again (<10% peeks, <5% average). > > So, either there is one VM/image that irritates the entire cluster or > we reached some kind of threshold or it's something completely > different. > > As for the well known fact: Peter Maloney pointed that out in this > thread (mail from last Thursday). > > The well known fact part was CoW which I guess is for all versions. > > The 'slower with every snapshot even after CoW totally flattens it' issue I > just find easy to test, and I didn't test it on hammer or earlier, and > others confirmed it, but didn't keep track of the versions. Just make an rbd > image, map it (probably... but my tests were with qemu librbd), do fio > randwrite tests with sync and direct on the device (no need for a fs, or > anything), and then make a few snaps and watch it go way slower. I'm not sure this is a correct diagnosis or assessment. In general, snapshots incur costs in two places: 1) the first write to an object after it is logically snapshotted, 2) when removing snapshots. There should be no long-term performance degradation, especially in XFS — it creates new copies of objects for each snapshot they change in. (btrfs and bluestore use block-based CoW, so they can suffer from fragmentation if things go too badly.) However, the costs of snapshot trimming (especially in Jewel) have been much discussed recently. (I'll have some announcements about improvements there soon!) So if you've got live trims happening, yes, there's an incremental load on the cluster. Similarly, creating a snapshot requires copying each snapshotted object into a new location, and then applying the write. Generally, that should amortize into nothingness, but it sounds like in this case you were basically doing a single IO per object for every snapshot you created — which, yes, would be impressively slow overall. The reports I've seen of slow snapshots have been one of the two above issues. Sometimes it's compounded by people not having enough incremental IOPS available to support their client workload while doing snapshots, but that doesn't mean snapshots are inherently expensive or inefficient[1], just that they do have a non-zero cost which your cluster needs to be able to provide. -Greg [1]: Although, yes, snap trimming is more expensive than in many similar systems. There are reasons for that which I discussed at Vault and will present on again at the upcoming OpenStack Boston Ceph day. :) _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com