Re: zfs2ceph and ideas for supporting migration to Ceph

Neal Gompa <ngompa13@xxxxxxxxx> · Sun, 8 Mar 2020 15:20:14 -0400

On Sat, Jan 11, 2020 at 4:55 PM Sage Weil <sage@xxxxxxxxxxxx> wrote:
>
> Hi Neal,
>
> On Fri, 10 Jan 2020, Neal Gompa wrote:
> > Hello all,
> >
> > I don't know how many of you folks are aware, but early last year,
> > Datto (full disclosure, my current employer, though I'm sending this
> > email pretty much on my own) released a tool called "zfs2ceph" as an
> > open source project[1]. This project was the result of a week-long
> > internal hackathon (SUSE folks may be familiar with this concept from
> > their own "HackWeek" program[2]) that Datto held internally in
> > December 2018. I was a member of that team, helping with research,
> > setting up infra, and making demos for it.
> >
> > Anyway, I'm bringing it up here because I'd had some conversations
> > with some folks individually who suggested that I bring it up here in
> > the mailing list and to talk about some of the motivations and what
> > I'd like to see in the future from Ceph on this.
>
> Nice!
>
> > The main motivation here was to provide a seamless mechanism to
> > transfer ZFS based datasets with the full chain of historical
> > snapshots onto Ceph storage with as much fidelity as possible to allow
> > a storage migration without requiring 2x-4x system resources. Datto is
> > in the disaster recovery business, so working backups with full
> > history are extremely valuable to Datto, its partners, and their
> > customers. That's why the traditional path of just syncing the current
> > state and letting the old stuff die off is not workable. At the scale
> > of having literally thousands of servers with each server having
> > hundreds of terabytes of ZFS storage (making up in aggregate to
> > hundreds of petabytes of data), there's no feasible way to consider
> > alternative storage options without having a way to transfer datasets
> > from ZFS to Ceph so that we can cut over servers to being Ceph nodes
> > with minimal downtime and near zero new server purchasing requirements
> > (there's obviously a little bit of extra hardware needed to "seed" a
> > Ceph cluster, but that's fine).
> >
> > The current zfs2ceph implementation handles zvol sends and transforms
> > them into rbd v1 import streams. I don't recall exactly the reason why
> > we don't use v2 anymore, but I think there was some gaps that made it
> > so it wasn't usable for our case back then (we were using Ceph
> > Luminous). I'm unsure if this is improved now, though it wouldn't
> > surprise me if it has. However, zvols aren't enough for us. Most of
>
> I'd be surprised if there was something that v1 had that v2 didn't.  Any
> other details you can remember?  Jason, does this bring anything to mind?
>

One of my teammates retrieved our notes from the time we were hacking
on it, and now I have some of those details:

The Ceph export feature set was inconsistent across v1 and v2:

* v1 format does not support multiple snapshots
* v2 format does not support partial exports
* import/export move complete volumes (v1 or v2)
* import-diff/export-diff send diffs between snapshots (v1)

Perhaps once v2 has all the features of v1, it'd be a better proper replacement.

> > our ZFS datasets are in the ZFS filesystem form, not the ZVol block
> > device form. Unfortunately, there is no import equivalent for CephFS,
> > which blocked an implemented of this capability[3]. I had filed a
> > request about it on the issue tracker, but it was rejected on the
> > basis of something was being worked on[4]. However, I haven't seen
> > something exactly like what I need land in CephFS yet.
>
> Patrick would know more (copied).
>
> > The code is pretty simple, and I think it would be easy enough for it
> > to be incorporated into Ceph itself. However, there's a greater
> > question here. Is there interest from the Ceph developer community in
> > developing and supporting strategies to migrate from legacy data
> > stores to Ceph with as much fidelity as reasonably possible?
> > Personally, I hope so. My hope is that this post generates some
> > interesting conversation about how to make this a better supported
> > capability within Ceph for block and filesystem data. :)
>
> I think including this into the relevant tools (e.g., rbd CLI) makes
> sense... as long as we can bring some tests along with it to ensure we're
> properly handing the 'zfs send' data stream.
>

That would be very cool!

-- 
真実はいつも一つ！/ Always, there's only one truth!
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx