Hi, Le 12/07/2016 02:51, Brad Hubbard a écrit : > [...] >>>> This is probably a fragmentation problem : typical rbd access patterns >>>> cause heavy BTRFS fragmentation. >>> To the extent that operations take over 120 seconds to complete? Really? >> Yes, really. I had these too. By default Ceph/RBD uses BTRFS in a very >> aggressive way, rewriting data all over the place and creating/deleting >> snapshots every filestore sync interval (5 seconds max by default IIRC). >> >> As I said there are 3 main causes of performance degradation : >> - the snapshots, >> - the journal in a standard copy-on-write file (move it out of the FS or >> use NoCow), >> - the weak auto defragmentation of BTRFS (autodefrag mount option). >> >> Each one of them is enough to impact or even destroy performance in the >> long run. The 3 combined make BTRFS unusable by default. This is why >> BTRFS is not recommended : if you want to use it you have to be prepared >> for some (heavy) tuning. The first 2 points are easy to address, for the >> last (which begins to be noticeable when you accumulate rewrites on your >> data) I'm not aware of any other tool than the one we developed and >> published on github (link provided in previous mail). >> >> Another thing : you better have a recent 4.1.x or 4.4.x kernel on your >> OSDs if you use BTRFS. We've used it since 3.19.x but I wouldn't advise >> it now and would recommend 4.4.x if it's possible for you and 4.1.x >> otherwise. > Thanks for the information. I wasn't aware things were that bad with BTRFS as > I haven't had much to do with it up to this point. Bad is relative. BTRFS was very time consuming to set up (mainly because of the defragmentation scheduler development but finding sources of inefficiency was no picnic either), but once used properly it has 3 unique advantages : - data checksums : this forces Ceph to use one good replica by refusing to hand over corrupted data and makes it far easier to handle silent data corruption (and some of our RAID controllers, probably damaged by electrical surges, had this nasty habit of flipping bits so it really was a big time/data saver here), - compression : you get more space for free, - speed : we get better latencies than XFS with it. Until bluestore is production ready (it should address these points even better than BTRFS does), if I don't find a use case where BTRFS falls on its face there's no way I'd used anything but BTRFS with Ceph. Best regards, Lionel _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com