RBD image "lightweight snapshots"

Paweł Sadowsk <ceph@xxxxxxxxx> · Fri, 10 Aug 2018 13:53:19 +0200

On 08/09/2018 04:39 PM, Alex Elder wrote:
> On 08/09/2018 08:15 AM, Sage Weil wrote:
>> On Thu, 9 Aug 2018, Piotr Dałek wrote:
>>> Hello,
>>>
>>> At OVH we're heavily utilizing snapshots for our backup system. We think
>>> there's an interesting optimization opportunity regarding snapshots I'd like
>>> to discuss here.
>>>
>>> The idea is to introduce a concept of a "lightweight" snapshots - such
>>> snapshot would not contain data but only the information about what has
>>> changed on the image since it was created (so basically only the object map
>>> part of snapshots).
>>>
>>> Our backup solution (which seems to be a pretty common practice) is as
>>> follows:
>>>
>>> 1. Create snapshot of the image we want to backup
>>> 2. If there's a previous backup snapshot, export diff and apply it on the
>>> backup image
>>> 3. If there's no older snapshot, just do a full backup of image
>>>
>>> This introduces one big issue: it enforces COW snapshot on image, meaning that
>>> original image access latencies and consumed space increases. "Lightweight"
>>> snapshots would remove these inefficiencies - no COW performance and storage
>>> overhead.
>>
>> The snapshot in 1 would be lightweight you mean?  And you'd do the backup 
>> some (short) time later based on a diff with changed extents?
>>
>> I'm pretty sure this will export a garbage image.  I mean, it will usually 
>> be non-garbage, but the result won't be crash consistent, and in some 
>> (many?) cases won't be usable.
>>
>> Consider:
>>
>> - take reference snapshot
>> - back up this image (assume for now it is perfect)
>> - write A to location 1
>> - take lightweight snapshot
>> - write B to location 1
>> - backup process copie location 1 (B) to target

The way I (we) see it working is a bit different:
 - take snapshot (1)
 - data write might occur, it's ok - CoW kicks in here to preserve data
 - export data
 - convert snapshot (1) to a lightweight one (not create new):
   * from now on just remember which blocks has been modified instead
     of doing CoW
   * you can get rid on previously CoW data blocks (they've been
     exported already)
 - more writes
 - take snapshot (2)
 - export diff - only blocks modified since snap (1)
 - convert snapshot (2) to a lightweight one
 - ...

That way I don't see a place for data corruption. Of course this has
some drawbacks - you can't rollback/export data from such lightweight
snapshot anymore. But on the other hand we are reducing need for CoW -
and that's the main goal with this idea. Instead of making CoW ~all the
time it's needed only for the time of exporting image/modified blocks.

>> That's the wrong data.  Maybe that change is harmless, but maybe location 
>> 1 belongs to the filesystem journal, and you have some records that now 
>> reference location 10 that as an A-era value, or haven't been written at 
>> all yet, and now your file system journal won't replay and you can't 
>> mount...
> 
> Forgive me if I'm misunderstanding; this just caught my attention.
> 
> The goal here seems to be to reduce the storage needed to do backups of an
> RBD image, and I think there's something to that.

Storage reduction is only side effect here. We want to get rid of CoW as
much as possible. In an example - we are doing snapshot every 24h - this
means that every 24h we will start doing CoW from the beginning on every
image. This has big impact on a cluster latency

As for the storage need, with 24h backup period we see a space usage
increase by about 5% on our clusters. But this clearly depends on client
traffic.

> This seems to be no different from any other incremental backup scheme.  It's
> layered, and it's ultimately based on an "epoch" complete backup image (what
> you call the reference snapshot).
> 
> If you're using that model, it would be useful to be able to back up only
> the data present in a second snapshot that's the child of the reference
> snapshot.  (And so on, with snapshot 2 building on snapshot 1, etc.)
> RBD internally *knows* this information, but I'm not sure how (or whether)
> it's formally exposed.
> 
> Restoring an image in this scheme requires restoring the epoch, then the
> incrementals, in order.  The cost to restore is higher, but the cost
> of incremental backups is significantly smaller than doing full ones.

It depends how we will store exported data. We might just want to merge
all diffs into base image right after export to keep only single copy.
But that is out of scope of main topic here, IMHO.

> I'm not sure how the "lightweight" snapshot would work though.  Without
> references to objects there's no guarantee the data taken at the time of
> the snapshot still exists when you want to back it up.
> 
> 					-Alex
> 
>>
>> sage
>>  
>>> At first glance, it seems like it could be implemented as extension to current
>>> RBD snapshot system, leaving out the machinery required for copy-on-write. In
>>> theory it could even co-exist with regular snapshots. Removal of these
>>> "lightweight" snapshots would be instant (or near instant).
>>>
>>> So what do others think about this?
>>>
>>> -- 
>>> Piotr Dałek
>>> piotr.dalek@xxxxxxxxxxxx
>>> https://www.ovhcloud.com
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>