Hi, Following the sf.net corruption report I've been checking our config w.r.t data consistency. AFAIK the two main recommendations are: 1) don't mount FileStores with nobarrier 2) disable write-caching (hdparm -W 0 /dev/sdX) when using block dev journals and your kernel is < 2.6.33 Obviously we don't do (1) because that would be crazy, but for (2) we didn't disable yet write-caching, probably because we didn't notice the doc. But my lame excuse is that apparently _check_disk_write_cache in FileJournal.cc doesn't print a warning when it should, because hdparm -W doesn't always work on partitions rather than whole block devices. See: GOOD: ceph 0.94.2, kernel 3.10.0-229.7.2.el7.x86_64, hdparm v9.43: 10 journal _open_block_device: ignoring osd journal size. We'll use the entire block device (size: 21474836480) 20 journal _check_disk_write_cache: disk write cache is on, but your kernel is new enough to handle it correctly. (fn:/var/lib/ceph/osd/ceph-96/journal) 1 journal _open /var/lib/ceph/osd/ceph-96/journal fd 20: 21474836480 bytes, block size 4096 bytes, directio = 1, aio = 1 BAD: ceph 0.94.2, kernel 2.6.32-431.29.2.el6.x86_64, hdparm v9.43: 10 journal _open_block_device: ignoring osd journal size. We'll use the entire block device (size: 21474836480) 1 journal _open /var/lib/ceph/osd/ceph-56/journal fd 19: 21474836480 bytes, block size 4096 bytes, directio = 1, aio = 1 In other words, running hammer on EL6, _check_disk_write_cache exits without printing anything, but actually it should log the scary "WARNING: disk write cache is ON". I guess it's because of this: GOOD # uname -r && hdparm -W /dev/sda && hdparm -W /dev/sda1 3.10.0-229.7.2.el7.x86_64 /dev/sda1: write-caching = 1 (on) /dev/sda: write-caching = 1 (on) BAD # uname -r && hdparm -W /dev/sda && hdparm -W /dev/sda1 2.6.32-431.23.3.el6.x86_64 /dev/sda: write-caching = 1 (on) /dev/sda1: HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device (in both cases /dev/sda is an INTEL SSDSC2BA20). So a few questions to end this: 1) What was the magic patch in 2.6.33 which made write-caching safe? 2) What's the recommended recourse here: hopefully Red Hat backported the necessary to their 2.6.32 kernel, but if not should we fix _check_disk_write_cache and make some publicity for people to check their configs? Best Regards, Dan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html