Re: RBD image "lightweight snapshots"

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 10 Aug 2018 09:24:34 -0700



On Fri, Aug 10, 2018 at 4:53 AM, Paweł Sadowsk <ceph@xxxxxxxxx> wrote:
> On 08/09/2018 04:39 PM, Alex Elder wrote:
>> On 08/09/2018 08:15 AM, Sage Weil wrote:
>>> On Thu, 9 Aug 2018, Piotr Dałek wrote:
>>>> Hello,
>>>>
>>>> At OVH we're heavily utilizing snapshots for our backup system. We think
>>>> there's an interesting optimization opportunity regarding snapshots I'd like
>>>> to discuss here.
>>>>
>>>> The idea is to introduce a concept of a "lightweight" snapshots - such
>>>> snapshot would not contain data but only the information about what has
>>>> changed on the image since it was created (so basically only the object map
>>>> part of snapshots).
>>>>
>>>> Our backup solution (which seems to be a pretty common practice) is as
>>>> follows:
>>>>
>>>> 1. Create snapshot of the image we want to backup
>>>> 2. If there's a previous backup snapshot, export diff and apply it on the
>>>> backup image
>>>> 3. If there's no older snapshot, just do a full backup of image
>>>>
>>>> This introduces one big issue: it enforces COW snapshot on image, meaning that
>>>> original image access latencies and consumed space increases. "Lightweight"
>>>> snapshots would remove these inefficiencies - no COW performance and storage
>>>> overhead.
>>>
>>> The snapshot in 1 would be lightweight you mean?  And you'd do the backup
>>> some (short) time later based on a diff with changed extents?
>>>
>>> I'm pretty sure this will export a garbage image.  I mean, it will usually
>>> be non-garbage, but the result won't be crash consistent, and in some
>>> (many?) cases won't be usable.
>>>
>>> Consider:
>>>
>>> - take reference snapshot
>>> - back up this image (assume for now it is perfect)
>>> - write A to location 1
>>> - take lightweight snapshot
>>> - write B to location 1
>>> - backup process copie location 1 (B) to target
>
> The way I (we) see it working is a bit different:
>  - take snapshot (1)
>  - data write might occur, it's ok - CoW kicks in here to preserve data
>  - export data
>  - convert snapshot (1) to a lightweight one (not create new):
>    * from now on just remember which blocks has been modified instead
>      of doing CoW
>    * you can get rid on previously CoW data blocks (they've been
>      exported already)
>  - more writes
>  - take snapshot (2)
>  - export diff - only blocks modified since snap (1)
>  - convert snapshot (2) to a lightweight one
>  - ...
>
>
> That way I don't see a place for data corruption. Of course this has
> some drawbacks - you can't rollback/export data from such lightweight
> snapshot anymore. But on the other hand we are reducing need for CoW -
> and that's the main goal with this idea. Instead of making CoW ~all the
> time it's needed only for the time of exporting image/modified blocks.

What's the advantage of remembering the blocks changed for a
"lightweight snapshot" once the actual data diff is no longer there?
Is there a meaningful difference between this and just immediately
deleting a snapshot after doing the export?
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com