Re: use ZFS for OSDs

Michal Kozanecki <mkozanecki@xxxxxxxxxx> · Wed, 29 Oct 2014 20:12:18 +0000

Hi Stijn,

Yes, on my cluster I am running; CentOS 7, ZoL 0.6.3, Ceph 80.5.

Cheers

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Stijn De Weirdt
Sent: October-29-14 3:49 PM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  use ZFS for OSDs

hi michal,

thanks for the info. we will certainly try it and see if we come to the same conclusions ;)

one small detail: since you were using centos7, i'm assuming you were using ZoL 0.6.3?

stijn

On 10/29/2014 08:03 PM, Michal Kozanecki wrote:
> Forgot to mention, when you create the ZFS/ZPOOL datasets, make sure 
> to set the xattar setting to sa
>
> e.g.
>
> zpool create osd01 -O xattr=sa -O compression=lz4 sdb
>
> OR if zpool/zfs dataset already created
>
> zfs set xattr=sa osd01
>
> Cheers
>
>
>
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf 
> Of Michal Kozanecki
> Sent: October-29-14 11:33 AM
> To: Kenneth Waegeman; ceph-users
> Subject: Re:  use ZFS for OSDs
>
> Hi Kenneth,
>
> I run a small ceph test cluster using ZoL (ZFS on Linux) ontop of 
> CentOS 7, so I'll try and answer any questions. :)
>
> Yes, ZFS writeparallel support is there, but NOT compiled in by 
> default. You'll need to compile it with --with-zlib, but that by 
> itself will fail to compile the ZFS support as I found out. You need 
> to ensure you have ZoL installed and working, and then pass the 
> location of libzfs to ceph at compile time. Personally I just set my 
> environment variables before compiling like so;
>
> ldconfig
> export LIBZFS_LIBS="/usr/include/libzfs/"
> export LIBZFS_CFLAGS="-I/usr/include/libzfs -I/usr/include/libspl"
>
> However, the writeparallel performance isn't all that great. The writeparallel mode makes heavy use of ZFS's (and BtrFS's for that matter) snapshotting capability, and the snap performance on ZoL, at least when I last tested it, is pretty terrible. You lose any performance benefits you gain with writeparallel to the poor snap performance.
>
> If you decide that you don't need writeparallel mode you, can use the prebuilt packages (or compile with default options) without issue. Ceph (without zlib support compiled in) will detect ZFS as a generic/ext4 file system and work accordingly.
>
> As far as performance tweaking, ZIL, write journals and etc, I found that the performance difference between using a ZIL vs ceph write journal is about the same. I also found that doing both (ZIL AND writejournal) didn't give me much of a performance benefit. In my small test cluster I decided after testing to forego the ZIL and only use a SSD backed ceph write journal on each OSD, with each OSD being a single ZFS dataset/vdev(no zraid or mirroring). With Ceph handling the redundancy at the OSD level I saw no need for using ZFS mirroring or zraid, instead if ZFS detects corruption instead of self-healing it sends a read failure of the pg file to ceph, and then ceph's scrub mechanisms should then repair/replace the pg file using a good replica elsewhere on the cluster. ZFS + ceph are a beautiful bitrot fighting match!
>
> Let me know if there's anything else I can answer.
>
> Cheers
>
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf 
> Of Kenneth Waegeman
> Sent: October-29-14 6:09 AM
> To: ceph-users
> Subject:  use ZFS for OSDs
>
> Hi,
>
> We are looking to use ZFS for our OSD backend, but I have some questions.
>
> My main question is: Does Ceph already supports the writeparallel mode for ZFS ? (as described here:
> http://www.sebastien-han.fr/blog/2013/12/02/ceph-performance-interesti
> ng-things-going-on/) I've found this, but I suppose it is outdated:
> https://wiki.ceph.com/Planning/Blueprints/Emperor/osd%3A_ceph_on_zfs
>
> Should Ceph be build with ZFS support? I found a --with-zfslib option 
> somewhere, but can someone verify this, or better has instructions for
> it?:-)
>
> What parameters should be tuned to use this?
> I found these :
>       filestore zfs_snap = 1
>       journal_aio = 0
>       journal_dio = 0
>
> Are there other things we need for it?
>
> Many thanks!!
> Kenneth
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com