Hi Mathieu, Le 02/09/2015 14:10, Mathieu GAUTHIER-LAFAYE a écrit : > Hi All, > > We have some troubles regularly with virtual machines using RBD storage. When we restart some virtual machines, they starts to do some filesystem checks. Sometime it can rescue it, sometime the virtual machine die (Linux or Windows). What is the cause of death as reported by the VM? FS inconsistency? Block device access timeout? ... > > We have move from Firefly to Hammer the last month. I don't know if the problem is in Ceph and is still there or if we continue to see symptom of a Firefly bug. > > We have two rooms in two separate building, so we set the replica size to 2. I'm in doubt if it can cause this kind of problems when scrubbing operations. I guess the recommended replica size is at less 3. Scrubbing is pretty harmless, deep scrubbing is another matter. Simultaneous deep scrubs on the same OSD are a performance killer. It seems latest Ceph versions provide some way of limiting its impact on performance (scrubs are done per pg so 2 simultaneous scrubs can and often involve the same OSD and I think there's a limit on scrubs per OSD now). AFAIK Firefly doesn't have this (and it surely didn't when we were confronted to the problem) so we developed our own deep scrub scheduler to avoid involving the same OSD twice (in fact our scheduler tries to interleave scrubs so that each OSD has as much inactivity after a deep scrub as possible before the next). This helps a lot. > > We use BTRFS for OSD with a kernel 3.10. This was not strongly discouraged when we start the deployment of CEPH last year. Now, it seems that the kernel version should be 3.14 or later for this kind of setup. See https://btrfs.wiki.kernel.org/index.php/Gotchas for various reasons to upgrade. We have a good deal of experience with Btrfs in production now. We had to disable snapshots, make the journal NoCOW, disable autodefrag and develop our own background defragmenter (which converts to zlib at the same time it defragments for additional space savings). We currently use kernel version 4.0.5 (we don't use any RAID level so we don't need 4.0.6 to get a fix for an online RAID level conversion bug) and I wouldn't use anything less than 3.19.5. The results are pretty good, but Btrfs is definitely not an out-of-the-box solution for Ceph. > > Does some people already have got similar problems ? Do you think, it's related to our BTRFS setup. Is it the replica size of the pool ? It mainly depends on the answer to the first question above (is it a corruption or a freezing problem?). Lionel _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com