Re: use ZFS for OSDs

Michal Kozanecki <mkozanecki@xxxxxxxxxx> · Fri, 31 Oct 2014 16:32:49 +0000

I'll test this by manually inducing corrupted data to the ZFS filesystem and report back how ZFS+ceph interact during a detected file failure/corruption, how it recovers and any manual steps required, and report back with the results. 

As for compression, using lz4 the CPU impact is around 5-20% depending on load, type of I/O and I/O size, with little-to-no I/O performance impact, and in fact in some cases the I/O performance actually increases. I'm currently looking at a compression ratio on the ZFS datasets of around 30-35% for a data consisting of rbd backed OpenStack KVM VMs. I have not tried any sort of dedupe as it is memory intensive and I only had 24GB of ram on each node. I'll grab some FIO benchmarks and report back.

Cheers,

-----Original Message-----
From: Christian Balzer [mailto:chibi@xxxxxxx] 
Sent: October-30-14 4:12 AM
To: ceph-users
Cc: Michal Kozanecki
Subject: Re:  use ZFS for OSDs

On Wed, 29 Oct 2014 15:32:57 +0000 Michal Kozanecki wrote:

[snip]
> With Ceph handling the
> redundancy at the OSD level I saw no need for using ZFS mirroring or 
> zraid, instead if ZFS detects corruption instead of self-healing it 
> sends a read failure of the pg file to ceph, and then ceph's scrub 
> mechanisms should then repair/replace the pg file using a good replica 
> elsewhere on the cluster. ZFS + ceph are a beautiful bitrot fighting 
> match!
> 
Could you elaborate on that? 
AFAIK Ceph currently has no way to determine which of the replicas is "good", one such failed PG object will require you to do a manual repair after the scrub and hope that two surviving replicas (assuming a size of
3) are identical. If not, start tossing a coin.
Ideally Ceph would have a way to know what happened (as in, it's a checksum and not a real I/O error) and do a rebuild of that object itself.

On an other note, have you done any tests using the ZFS compression?
I'm wondering what the performance impact and efficiency are.

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com