Re: use ZFS for OSDs

Michal Kozanecki <mkozanecki@xxxxxxxxxx> · Wed, 29 Oct 2014 19:03:24 +0000

Forgot to mention, when you create the ZFS/ZPOOL datasets, make sure to set the xattar setting to sa

e.g.

zpool create osd01 -O xattr=sa -O compression=lz4 sdb

OR if zpool/zfs dataset already created

zfs set xattr=sa osd01

Cheers

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Michal Kozanecki
Sent: October-29-14 11:33 AM
To: Kenneth Waegeman; ceph-users
Subject: Re:  use ZFS for OSDs

Hi Kenneth,

I run a small ceph test cluster using ZoL (ZFS on Linux) ontop of CentOS 7, so I'll try and answer any questions. :) 

Yes, ZFS writeparallel support is there, but NOT compiled in by default. You'll need to compile it with --with-zlib, but that by itself will fail to compile the ZFS support as I found out. You need to ensure you have ZoL installed and working, and then pass the location of libzfs to ceph at compile time. Personally I just set my environment variables before compiling like so;

ldconfig
export LIBZFS_LIBS="/usr/include/libzfs/"
export LIBZFS_CFLAGS="-I/usr/include/libzfs -I/usr/include/libspl"

However, the writeparallel performance isn't all that great. The writeparallel mode makes heavy use of ZFS's (and BtrFS's for that matter) snapshotting capability, and the snap performance on ZoL, at least when I last tested it, is pretty terrible. You lose any performance benefits you gain with writeparallel to the poor snap performance. 

If you decide that you don't need writeparallel mode you, can use the prebuilt packages (or compile with default options) without issue. Ceph (without zlib support compiled in) will detect ZFS as a generic/ext4 file system and work accordingly. 

As far as performance tweaking, ZIL, write journals and etc, I found that the performance difference between using a ZIL vs ceph write journal is about the same. I also found that doing both (ZIL AND writejournal) didn't give me much of a performance benefit. In my small test cluster I decided after testing to forego the ZIL and only use a SSD backed ceph write journal on each OSD, with each OSD being a single ZFS dataset/vdev(no zraid or mirroring). With Ceph handling the redundancy at the OSD level I saw no need for using ZFS mirroring or zraid, instead if ZFS detects corruption instead of self-healing it sends a read failure of the pg file to ceph, and then ceph's scrub mechanisms should then repair/replace the pg file using a good replica elsewhere on the cluster. ZFS + ceph are a beautiful bitrot fighting match!

Let me know if there's anything else I can answer. 

Cheers

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Kenneth Waegeman
Sent: October-29-14 6:09 AM
To: ceph-users
Subject:  use ZFS for OSDs

Hi,

We are looking to use ZFS for our OSD backend, but I have some questions.

My main question is: Does Ceph already supports the writeparallel mode for ZFS ? (as described here:  
http://www.sebastien-han.fr/blog/2013/12/02/ceph-performance-interesting-things-going-on/)
I've found this, but I suppose it is outdated:  
https://wiki.ceph.com/Planning/Blueprints/Emperor/osd%3A_ceph_on_zfs

Should Ceph be build with ZFS support? I found a --with-zfslib option somewhere, but can someone verify this, or better has instructions for
it?:-)

What parameters should be tuned to use this?
I found these :
     filestore zfs_snap = 1
     journal_aio = 0
     journal_dio = 0

Are there other things we need for it?

Many thanks!!
Kenneth

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com