I have now a few times experienced a problem with the i_blocks field of
a few inodes being corrupted (replaced by extremely large numbers).
I don't believe that it is a disk error - the file system is on a RAID1
partition and the RAID consistency is checked regularly.
I also find it hard to believe that it is a RAM error - the machine has
run memtest86+ overnight without finding anything.
The files I've seen corrupted are simple small text files that are
modified only using an ordinary text editor (emacs).
Fsck fixes it.
The system is an up-to-date Debian Bookworm:
Linux nuser 6.1.0-25-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3
(2024-08-26) x86_64 GNU/Linux
I do one thing that is not the default for ext4: I use the "nodelalloc"
option (because several years ago, there was a discussion about
"delalloc or not" from which I got the impression that nodelalloc was
probably slightly safer - if the resulting performance reduction is not
a problem, which it is not for me):
/dev/md0 on / type ext4 (rw,relatime,nodelalloc,errors=remount-ro)
Three examples follow below. Note that the bad field values, when
interpreted as 48-bit signed numbers, are numerically small negative
numbers (-25, -9, -3, respectively).
Excerpts from the fsck logs:
root: Inode 10748715, i_blocks is 281474976710631, should be 5. FIXED.
root: Inode 10751288, i_blocks is 281474976710647, should be 3. FIXED.
root: Inode 10748542, i_blocks is 281474976710653, should be 1. FIXED.
I don't know when the first two of these corruptions occurred, but the
last one happened yesterday or the day before. The file in question was
/etc/fstab, and I discovered the problem after I had edited fstab on
Wednesday and rebooted on Thursday.
The corrupted files can be read and copied without problems. I have not
dared to delete any of those files before fsck had fixed them.
What is going on here?
Thanks,
Jesper
--
Jesper Dybdal
https://www.dybdal.dk