Re: [ceph-users] RBD image "lightweight snapshots"

Jason Dillaman <jdillama@xxxxxxxxxx> · Mon, 27 Aug 2018 09:12:02 -0400

On Mon, Aug 27, 2018 at 3:29 AM Bartosz Rabiega
<bartosz.rabiega@xxxxxxxxxxxx> wrote:
>
> Bumping the topic.
>
>
> So, what do you think guys?

Not sure if you saw my response from August 13th, but I stated that
this is something that you should be able to build right now using the
RADOS Python bindings and the rbd CLI. It would be pretty dangerous
for the average user to use without adding a lot of safety guardrails
to the entire process, however.

Of course, now that I think about it some more, I am not sure how the
OSDs would behave if sent a snap set with a deleted snapshot. They
used to just filter the errant entry, but I'm not sure how they would
behave under the removed snapshot interval set cleanup logic [1].

> On 08/13/2018 12:22 PM, Bartosz Rabiega wrote:
> >
> >
> > On 08/11/2018 07:56 AM, Paweł Sadowski wrote:
> >> On 08/10/2018 06:24 PM, Gregory Farnum wrote:
> >>> On Fri, Aug 10, 2018 at 4:53 AM, Paweł Sadowsk <ceph@xxxxxxxxx> wrote:
> >>>> On 08/09/2018 04:39 PM, Alex Elder wrote:
> >>>>> On 08/09/2018 08:15 AM, Sage Weil wrote:
> >>>>>> On Thu, 9 Aug 2018, Piotr Dałek wrote:
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> At OVH we're heavily utilizing snapshots for our backup system.
> >>>>>>> We think
> >>>>>>> there's an interesting optimization opportunity regarding
> >>>>>>> snapshots I'd like
> >>>>>>> to discuss here.
> >>>>>>>
> >>>>>>> The idea is to introduce a concept of a "lightweight" snapshots
> >>>>>>> - such
> >>>>>>> snapshot would not contain data but only the information about
> >>>>>>> what has
> >>>>>>> changed on the image since it was created (so basically only the
> >>>>>>> object map
> >>>>>>> part of snapshots).
> >>>>>>>
> >>>>>>> Our backup solution (which seems to be a pretty common practice)
> >>>>>>> is as
> >>>>>>> follows:
> >>>>>>>
> >>>>>>> 1. Create snapshot of the image we want to backup
> >>>>>>> 2. If there's a previous backup snapshot, export diff and apply
> >>>>>>> it on the
> >>>>>>> backup image
> >>>>>>> 3. If there's no older snapshot, just do a full backup of image
> >>>>>>>
> >>>>>>> This introduces one big issue: it enforces COW snapshot on
> >>>>>>> image, meaning that
> >>>>>>> original image access latencies and consumed space increases.
> >>>>>>> "Lightweight"
> >>>>>>> snapshots would remove these inefficiencies - no COW performance
> >>>>>>> and storage
> >>>>>>> overhead.
> >>>>>> The snapshot in 1 would be lightweight you mean?  And you'd do
> >>>>>> the backup
> >>>>>> some (short) time later based on a diff with changed extents?
> >>>>>>
> >>>>>> I'm pretty sure this will export a garbage image.  I mean, it
> >>>>>> will usually
> >>>>>> be non-garbage, but the result won't be crash consistent, and in
> >>>>>> some
> >>>>>> (many?) cases won't be usable.
> >>>>>>
> >>>>>> Consider:
> >>>>>>
> >>>>>> - take reference snapshot
> >>>>>> - back up this image (assume for now it is perfect)
> >>>>>> - write A to location 1
> >>>>>> - take lightweight snapshot
> >>>>>> - write B to location 1
> >>>>>> - backup process copie location 1 (B) to target
> >>>> The way I (we) see it working is a bit different:
> >>>>   - take snapshot (1)
> >>>>   - data write might occur, it's ok - CoW kicks in here to preserve
> >>>> data
> >>>>   - export data
> >>>>   - convert snapshot (1) to a lightweight one (not create new):
> >>>>     * from now on just remember which blocks has been modified instead
> >>>>       of doing CoW
> >>>>     * you can get rid on previously CoW data blocks (they've been
> >>>>       exported already)
> >>>>   - more writes
> >>>>   - take snapshot (2)
> >>>>   - export diff - only blocks modified since snap (1)
> >>>>   - convert snapshot (2) to a lightweight one
> >>>>   - ...
> >>>>
> >>>>
> >>>> That way I don't see a place for data corruption. Of course this has
> >>>> some drawbacks - you can't rollback/export data from such lightweight
> >>>> snapshot anymore. But on the other hand we are reducing need for CoW -
> >>>> and that's the main goal with this idea. Instead of making CoW ~all
> >>>> the
> >>>> time it's needed only for the time of exporting image/modified blocks.
> >>> What's the advantage of remembering the blocks changed for a
> >>> "lightweight snapshot" once the actual data diff is no longer there?
> >>> Is there a meaningful difference between this and just immediately
> >>> deleting a snapshot after doing the export?
> >>> -Greg
> >>
> >> Advantage is that when I need to export diff I know which blocks
> >> changed,
> >> without checking (reading) others so I can just export them for backup.
> >> If i delete snapshot after export, next time I'll have to read whole
> >> image
> >> again - no possibility to do differential backup.
> >>
> >> But as Sage wrote, we are doing this on Filestore. I don't know how
> >> Bluestore
> >> works with snapshots (are whole 4MB chunks copied or only area of
> >> current write)
> >> so performance might be much better - need to test it.
> >>
> >> Our main goal with this idea is to improve performance in case where
> >> all images
> >> have at least one snapshot taken every *backup period* (24h or lower).
> >>
> >
> > The actual advantage lies in keeping COW at minimum.
> >
> > Assuming that you want to do differential backups every 24h.
> >
> > With normal snapshots:
> > 1. Create snapshot A, do full image export, takes 3h
> > 2. Typical client IO, all writes are COW for 24h
> > 3. After 24h Create snapshot B, and do export diff (A -> B), takes 0.5h
> > 4. Remove snapshot A, as it's no longer needed
> > 5. Typical client IO, all writes are COW for 24h
> > 6. After 24h Create snapshot C, and do export diff (B -> C), takes 0.5h
> > 7. Remove snapshot B, as it's no longer needed
> > 8. Typical client IO, all writes are COW for 24h
> >
> > Simplified estimation:
> > COW done for writes all the time since snapshot A = 72h of COW
> >
> > With 'lightweight' snapshots
> > 1. Create snapshot A, do full image export, takes 3h
> > 2. Convert snapshot A to lightweight
> > 3. Typical client IO, COW was done for 3h only
> > 4. After 24h Create snapshot B, and do export diff (A -> B), takes 0.5h
> > 5. Remove snapshot A, as it's no longer needed
> > 6. Convert snapshot B to lightweight
> > 7. Typical client IO, COW was done only for 0.5h
> > 8. After 24h Create snapshot C, and do export diff (B -> C), takes 0.5h
> > 9. Remove snapshot B, as it's no longer needed
> > 10. Convert snapshot C to lightweight
> > 11. Typical client IO, all writes are COW for 0.5h
> >
> > Simplified estimation:
> > COW done for full snapshot lifespan - 3h + 0.5h + 0.5h = 4h of COW
> >
> > The longer it lasts the bigger the advantage.
> > I'm not sure how smart COW with bluestore is but still for such use
> > case 'lightweight' snapshots would probably give much savings (COW
> > overhead (CPU + storage IO).
> >
> > Bartosz Rabiega
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[1] http://github.com/ceph/ceph/pull/18276

-- 
Jason