Hi Christian, On 09/09/2014 6:33 PM, "Christian Balzer" <chibi at gol.com> wrote: > I have nearly no experience with ZFS, but I'm wondering why you'd pool > things on the level when Ceph is already supplying a redundant and > resizeable block device. That's really subject to further testing. At this stage I'm just guessing that multiple vdevs (one to one rbd-vdev mapping) may give ZFS more opportunity to parallelise workload to the cluster, and if we do need to expand a pool it's obvious we could just add vdevs in xTB blocks rather than growing non-redundant vdevs (don't even know if that is possible). I'm sketchy about how that works in ZFS so will need some testing to determine if that's really the best option. The reason for leaning towards ZFS is inline compression and native read cache and write-log device support. The rich set of dataset level features also doesn't hurt. > Using a CoW filesystem on top of RBD might not be a great idea either, > since it is sparsely allocated, performance is likely to be bad until all > "blocks" have been actually allocated. Maybe somebody with experience in > that can pipe up. That's an interesting observation, though must admit I'm struggling to visualise the problem. <SNIP> > Another scenario might be running the NFS heads on VMs, thus using librbd > an having TRIM (with the correct disk device type). And again use > pacemaker to quickly fail over things. Ah yes, I forgot to mention plans for KVM based presentation servers in order to get librbd rather than krbd - that's a good point, I hadn't specifically thought about TRIM but rather just the general lag of the kernel. (Those VMs would have pci pass-through for the latency sensitive devices - vNIC, ZIL, L2ARC.) Also planning nightly backups of these filesystems to tape via TSM (using the agent journal, which seems to work okay with ZoL from basic tests). Cheers, ~Blairo