On 1-4-2017 21:59, Wido den Hollander wrote: > >> Op 31 maart 2017 om 19:15 schreef Willem Jan Withagen <wjw@xxxxxxxxxxx>: >> >> >> On 31-3-2017 17:32, Wido den Hollander wrote: >>> Hi Willem Jan, >>> >>>> Op 30 maart 2017 om 13:56 schreef Willem Jan Withagen >>>> <wjw@xxxxxxxxxxx>: >>>> >>>> >>>> Hi, >>>> >>>> I'm pleased to announce that my efforts to port to FreeBSD have >>>> resulted in a ceph-devel port commit in the ports tree. >>>> >>>> https://www.freshports.org/net/ceph-devel/ >>>> >>> >>> Awesome work! I don't touch FreeBSD that much, but I can imagine that >>> people want this. >>> >>> Out of curiosity, does this run on ZFS under FreeBSD? Or what >>> Filesystem would you use behind FileStore with this? Or does >>> BlueStore work? >> >> Since I'm a huge ZFS fan, that is what I run it on. > > Cool! The ZIL, ARC and L2ARC can actually make that very fast. Interesting! Right, ZIL is magic, and more or equal to the journal now used with OSDs for exactly the same reason. Sad thing is that a write is now 3* journaled: 1* by Ceph, and 2* by ZFS. Which means that the used bandwidth to the SSDs is double of what it could be. Had some discussion about this, but disabling the Ceph journal is not just setting an option. Although I would like to test performance of an OSD with just the ZFS journal. But I expect that the OSD journal is rather firmly integrated. Now the real nice thing is that one does not need to worry about cacheing the OSD performance. This is fully covered by ZFS. Both by ARC and L2ARC. And ZIL and L2ARC can be constructed again in all shapes and forms that all AFS vdev's can be made. So for the ZIL you'd build and SSD's mirror: double the write speed, but still redundant. For L2ARC I'd concatenate 2 SSD's to get the read bandwidth. And contrary to some of the other caches ZFS does not return errors if the l2arc devices go down. (note that data errors are detected by checksumming) So that again is one less thing to worry about. > CRC and Compression from ZFS are also very nice. I did not want to go into too much details, but this is a large part of the reasons. Compression I tried a bit, but does cost quite a bit of performance at the Ceph end. Perhaps because the write to the journal is synced, and thus has to way on both compression and synced writting. It also bring snapshots without much hassle. But I have not yet figured (looked at) out if and how btrfs snapshots are used. Other challenge is the Ceph deep scrubbing: checking for corruption within files. ZFS is able to detect corruption all by itself due to extensive file checksumming. And with something way much stronger/better that crc32. (just put on my fireproof suite) So I'm not certain that deep-scrub would be obsolete, but I think it could the frequency could perhaps go down, and/or be triggered by ZFS errors after scrubbing a pool. Something that has way much less impact on performance. In some of the talks I give, I always try to explain to people that RAID and RAID controllers are the current dinosaurs of IT. >> To be honest I have not tested on UFS, but I would expect that the xattr >> are not long enough. >> >> BlueStore is not (yet) available because there is a different AIO >> implementation on FreeBSD. But Sage thinks it is very doable to glue in >> posix AIO. And one of my port reviewers has offered to look at it. So it >> could be that BlueStore will be available in the foreseeable future. >> >> --WjW _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com