Darren Soothill (darren.soothill) writes: > Hi Fabien, > > ZFS ontop of RBD really makes me shudder. ZFS expects to have individual disk devices that it can manage. It thinks it has them with this config but CEPH is masking the real data behind it. > > As has been said before why not just use Samba directly from CephFS and remove that layer of complexity in the middle. As a user of ZFS on ceph, I can explain some of our motivation. As it was pointed out earlier in this thread CephFS will give you snapshots but not diffs between them. I don't know what the intent was with using diffs, but in ZFS' case, snapshots provide a basis for checkpointing/ recovery, instant dataset cloning, but also for replication/offsite mirroring (although not synchronous) - so could easily back up/replicate the ZFS datasets to another location that doesn't necessarily have a CEPH installation (say, big, cheap JBOD box with SMR drives running native ZFS). And, you can diff between snapshots to see instantly which files were modified. In addition to the other benefits of running ZFS such as lz4 compression (per dataset), deduplication, etc. While it's true that ZFS on top of RBD is not optimal, it's not particularly dangerous or unreliable. You provide it with multiple RBDs, create a pool out of those (ZFS pool, not ceph pool :). It sees each RBD as an individual disk, and can issue I/O to those indepdently. If anything, you lose some of the benefits of ZFS (automatic error correction - everything is still checksummed and you detect corruption). I already run ZFS within a VM (all our customers are hosted like this, using LXD or FreeBSD jails), whether the backing store is NFS, local disk or RBD doesn't really matter. So why NOT run ZFS on top of RBD ? Complexity mostly, and some measure of lost performance... But CephFS isn't exactly simple stuff to run in a reliable manner as of yet (MDS performance and possible deadlocks are an issue). If you're planning on serving files, you're still going to need an NFS or SMB layer. If you're on CEPHFS, you can serve via Ganesha or Samba without adding the extra ZFS layering which will add latency, but either way you're still going to drag the data out of cephfs to the client mounting the FS, export that via Samba/NFS. If instead you attach, say, 10 x 1 TB RBD images from a host, assemble those into a zfs pool, and run NFS or Samba on top of that, you'll have more or less the same data path, but in addition you'll be going through ZFS which introduces latency. Now, if you're daring, you create a ceph pool with size=1, min size=1 (will ceph let you do that ? :), you map RBDs out of that, hand them over to ZFS in a raid+mirror config (or raidz2) - and let ZFS deal with failings VDEVs by giving it new RBDs to replace them. Sounds crazy ? Well, you lose the benefit of CEPH's self-healing, but you still get a super scalable ZFS running on a near limitless supply of JBOD :) And, you can quickly set up different (zfs) pools with different levels of redundancy, quotas, compression, metadata options, etc... Who says you can't do both anyway ? (CephFS and ZFS), CEPH is flexible enough... _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx