On 12/29/14 15:49, Thomas Lemarchand wrote: > I too dislike the fact that it's not "native" (ie developed inside the > Linux Kernel), and this is why I'm not sure this project is a good > solution. > > The userbase is necessarily much lower it would be if this was native, > so less tests, less feedbacks, and potentially less security. > > When I use ZFS on FreeBSD, I know it's widely used and tested. > > Since you can have multiple backend FS for your OSD inside a Ceph > cluster, what I do know is a mix between your alternatives 1 and 2. > > XFS for now, and upgrade to BTRFS once it is ready. > > On a test cluster (1 MON, 6 OSDs), I started with XFS (for a few > months), then moved it to BTRFS (without losing a single bit) for a few > months, then had a problem with BTRFS snapshots (without playing with > any kind of snapshot in Ceph, weird) Hi, Ceph OSDs use BTRFS snapshots automatically. OSDs create and destroy snapshots at a relatively high rate and - according to recent answers on this list - much higher than what is expected by the BTRFS developpers. It seems that it prevents the BTRFS autodefragmenter to catch up leading to heavily fragmented OSDs. I wonder how well BTRFS would work for OSDs if Ceph devs would disable snapshots on it. I guess it would prevent the current neat trick on BTRFS of using a single write for both the journal and the data directory updates but we could at least benefit from the lzo/zlib compression which would help both performance and capacity. This would probably be a far more stable platform too: all the BTRFS bugs we encountered were triggered by snapshosts and/or the way Ceph uses snapshots on BTRFS. For people testing BTRFS OSD, you might want to disable the autodefragmenter and schedule periodic defragmentations, this more brute force approach *might* work much better than relying on the autodefragmenter heuristics. According to my last tests with BTRFS OSDs, performance degrade slowly: on our setup and with our usage pattern if manual defragmentation solved the fragmentation problem launching it once per week would have been more than enough to maintain performance above what XFS provides (with a dedicated journal partition) on the same hardware. The last time I checked there were some BTRFS stability bugs in various kernel versions, the most stable kernel version for us (where we couldn't break BTRFS on more than 10+ OSDs with a moderately high load) was 3.16.4 (3.17.0 and 3.17.1 had a nasty bug which remounted the fs read-only on occasion). Currently we have a pure XFS setup but I will probably test this strategy with additional OSDs the next time we raise our capacity. The benefits are hard to ignore: journal writes are "free" on BTRFS (I suppose there is a bit of overhead for creating the snapshots making this possible but it's most probably far less than writing the same data twice) and lzo works great for us (giving us 20-30% additional space and most probably a little performance advantage too) Best regards, Lionel Bouton _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com