On Fri, Aug 10, 2018 at 4:53 AM, Paweł Sadowsk <ceph@xxxxxxxxx> wrote: > On 08/09/2018 04:39 PM, Alex Elder wrote: >> On 08/09/2018 08:15 AM, Sage Weil wrote: >>> On Thu, 9 Aug 2018, Piotr Dałek wrote: >>>> Hello, >>>> >>>> At OVH we're heavily utilizing snapshots for our backup system. We think >>>> there's an interesting optimization opportunity regarding snapshots I'd like >>>> to discuss here. >>>> >>>> The idea is to introduce a concept of a "lightweight" snapshots - such >>>> snapshot would not contain data but only the information about what has >>>> changed on the image since it was created (so basically only the object map >>>> part of snapshots). >>>> >>>> Our backup solution (which seems to be a pretty common practice) is as >>>> follows: >>>> >>>> 1. Create snapshot of the image we want to backup >>>> 2. If there's a previous backup snapshot, export diff and apply it on the >>>> backup image >>>> 3. If there's no older snapshot, just do a full backup of image >>>> >>>> This introduces one big issue: it enforces COW snapshot on image, meaning that >>>> original image access latencies and consumed space increases. "Lightweight" >>>> snapshots would remove these inefficiencies - no COW performance and storage >>>> overhead. >>> >>> The snapshot in 1 would be lightweight you mean? And you'd do the backup >>> some (short) time later based on a diff with changed extents? >>> >>> I'm pretty sure this will export a garbage image. I mean, it will usually >>> be non-garbage, but the result won't be crash consistent, and in some >>> (many?) cases won't be usable. >>> >>> Consider: >>> >>> - take reference snapshot >>> - back up this image (assume for now it is perfect) >>> - write A to location 1 >>> - take lightweight snapshot >>> - write B to location 1 >>> - backup process copie location 1 (B) to target > > The way I (we) see it working is a bit different: > - take snapshot (1) > - data write might occur, it's ok - CoW kicks in here to preserve data > - export data > - convert snapshot (1) to a lightweight one (not create new): > * from now on just remember which blocks has been modified instead > of doing CoW > * you can get rid on previously CoW data blocks (they've been > exported already) > - more writes > - take snapshot (2) > - export diff - only blocks modified since snap (1) > - convert snapshot (2) to a lightweight one > - ... > > > That way I don't see a place for data corruption. Of course this has > some drawbacks - you can't rollback/export data from such lightweight > snapshot anymore. But on the other hand we are reducing need for CoW - > and that's the main goal with this idea. Instead of making CoW ~all the > time it's needed only for the time of exporting image/modified blocks. What's the advantage of remembering the blocks changed for a "lightweight snapshot" once the actual data diff is no longer there? Is there a meaningful difference between this and just immediately deleting a snapshot after doing the export? -Greg