> I used NILFS over ISCSI. I had random block corruption during > one week, silently destroying data until NILFS finally > crashed. First of all, I thought about a NILFS bug, so I > created a BTRFS volume I use both for main filesystem and backup for "diversity", and I value NILFS2 because it is very robust (I don't really use either filesystems snapshotting features). > and restored the backup from one week earlier to it. After > minutes, the BTRFS volume gave checksum errors, so the > culrprit was found, the ISCSI server. There used to be a good argument that checksumming (or compressing) data should be end-to-end and checksumming (or compressing) in the filesystem is a bit too much, but when LOGFS and NILFS/nILFS2 were designed I guess CPUs were too slow to checksum everything. Even excellent recent filesystems like F2FS don't do data integrity checking for various reasons though. In theory your iSCSI or its host-adapter should have told you about errors... Many can enable after-write verification (even if its quite expensive). Alternatively some people run regularly silent-corruption detecting daemons if their hardware does not report corruption or it escapes the relevant checks for various reasons: https://indico.desy.de/event/257/contributions/58082/attachments/37574/46878/kelemen-2007-HEPiX-Silent_Corruptions.pdf https://storagemojo.com/2007/09/19/cerns-data-corruption-research/ > [...] NILFS creates checksums on block writes. It would really > be a good addition to verify these checksums on read [...] It would be interesting to have data integrity checking or compression in NILFS2, and log-structured filesystem makes that easier (Btrfs code is rather complex instead), but modifying mature and stable filesystems is a risky thing... My understanding is that these checksums are not quite suitable for data integrity checks but are designed for log-sequence recovery, a bit like journal checksums for journal-based filesystems. https://www.spinics.net/lists/linux-nilfs/msg01063.html "nilfs2 store checksums for all data. However, at least the current implementation does not verify it when reading. Actually, the main purpose of the checksums is recovery after unexpected reboot; it does not suit for per-file data verification because the checksums are given per ``log''."