Hello all, I don't know how many of you folks are aware, but early last year, Datto (full disclosure, my current employer, though I'm sending this email pretty much on my own) released a tool called "zfs2ceph" as an open source project[1]. This project was the result of a week-long internal hackathon (SUSE folks may be familiar with this concept from their own "HackWeek" program[2]) that Datto held internally in December 2018. I was a member of that team, helping with research, setting up infra, and making demos for it. Anyway, I'm bringing it up here because I'd had some conversations with some folks individually who suggested that I bring it up here in the mailing list and to talk about some of the motivations and what I'd like to see in the future from Ceph on this. The main motivation here was to provide a seamless mechanism to transfer ZFS based datasets with the full chain of historical snapshots onto Ceph storage with as much fidelity as possible to allow a storage migration without requiring 2x-4x system resources. Datto is in the disaster recovery business, so working backups with full history are extremely valuable to Datto, its partners, and their customers. That's why the traditional path of just syncing the current state and letting the old stuff die off is not workable. At the scale of having literally thousands of servers with each server having hundreds of terabytes of ZFS storage (making up in aggregate to hundreds of petabytes of data), there's no feasible way to consider alternative storage options without having a way to transfer datasets from ZFS to Ceph so that we can cut over servers to being Ceph nodes with minimal downtime and near zero new server purchasing requirements (there's obviously a little bit of extra hardware needed to "seed" a Ceph cluster, but that's fine). The current zfs2ceph implementation handles zvol sends and transforms them into rbd v1 import streams. I don't recall exactly the reason why we don't use v2 anymore, but I think there was some gaps that made it so it wasn't usable for our case back then (we were using Ceph Luminous). I'm unsure if this is improved now, though it wouldn't surprise me if it has. However, zvols aren't enough for us. Most of our ZFS datasets are in the ZFS filesystem form, not the ZVol block device form. Unfortunately, there is no import equivalent for CephFS, which blocked an implemented of this capability[3]. I had filed a request about it on the issue tracker, but it was rejected on the basis of something was being worked on[4]. However, I haven't seen something exactly like what I need land in CephFS yet. The code is pretty simple, and I think it would be easy enough for it to be incorporated into Ceph itself. However, there's a greater question here. Is there interest from the Ceph developer community in developing and supporting strategies to migrate from legacy data stores to Ceph with as much fidelity as reasonably possible? Personally, I hope so. My hope is that this post generates some interesting conversation about how to make this a better supported capability within Ceph for block and filesystem data. :) Best regards, Neal [1]: https://github.com/datto/zfs2ceph [2]: https://hackweek.suse.com/ [3]: https://github.com/datto/zfs2ceph/issues/1 [4]: https://tracker.ceph.com/issues/40390 -- 真実はいつも一つ!/ Always, there's only one truth! _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx