On Sep 27, 2024, at 8:38 AM, Jesper Dybdal <jd-ext4@xxxxxxxxx> wrote: > > I have now a few times experienced a problem with the i_blocks field of a few inodes being corrupted (replaced by extremely large numbers). > > I don't believe that it is a disk error - the file system is on a RAID1 partition and the RAID consistency is checked regularly. > I also find it hard to believe that it is a RAM error - the machine has run memtest86+ overnight without finding anything. > > The files I've seen corrupted are simple small text files that are modified only using an ordinary text editor (emacs). > > Fsck fixes it. > The system is an up-to-date Debian Bookworm: > Linux nuser 6.1.0-25-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux > > I do one thing that is not the default for ext4: I use the "nodelalloc" option (because several years ago, there was a discussion about "delalloc or not" from which I got the impression that nodelalloc was probably slightly safer - if the resulting performance reduction is not a problem, which it is not for me): > /dev/md0 on / type ext4 (rw,relatime,nodelalloc,errors=remount-ro) > > Three examples follow below. Note that the bad field values, when interpreted as 48-bit signed numbers, are numerically small negative numbers (-25, -9, -3, respectively). > > Excerpts from the fsck logs: > root: Inode 10748715, i_blocks is 281474976710631, should be 5. FIXED. > root: Inode 10751288, i_blocks is 281474976710647, should be 3. FIXED. > root: Inode 10748542, i_blocks is 281474976710653, should be 1. FIXED. > > I don't know when the first two of these corruptions occurred, but the last one happened yesterday or the day before. The file in question was /etc/fstab, and I discovered the problem after I had edited fstab on Wednesday and rebooted on Thursday. > > The corrupted files can be read and copied without problems. I have not dared to delete any of those files before fsck had fixed them. > > What is going on here? This looks like an underflow of the used blocks count on the inode: 281474976710631 = 0xffffffffffe7 281474976710647 = 0xfffffffffff7 281474976710653 = 0xfffffffffffd This is 2^48 blocks, which is the limit for the number of blocks that fit into the available inode fields (32-bit i_blocks_lo, 16-bit i_blocks_hi). There is likely some kind of accounting error in the code. Is anything unusual with access patterns for those files (large xattrs/ACLs, are they files or directories or special files. mmap, truncate, fallocate, etc.)? If you are able to reproduce with the /etc/fstab editing, possibly strace could help to identify if something unusual is being done to the file. Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP