Re: Corrupted i_blocks field

Andreas Dilger <adilger@xxxxxxxxx> · Mon, 30 Sep 2024 14:29:40 -0600

On Sep 27, 2024, at 8:38 AM, Jesper Dybdal <jd-ext4@xxxxxxxxx> wrote:
> 
> I have now a few times experienced a problem with the i_blocks field of a few inodes being corrupted (replaced by extremely large numbers).
> 
> I don't believe that it is a disk error - the file system is on a RAID1 partition and the RAID consistency is checked regularly.
> I also find it hard to believe that it is a RAM error - the machine has run memtest86+ overnight without finding anything.
> 
> The files I've seen corrupted are simple small text files that are modified only using an ordinary text editor (emacs).
> 
> Fsck fixes it.
> The system is an up-to-date Debian Bookworm:
>     Linux nuser 6.1.0-25-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
> 
> I do one thing that is not the default for ext4: I use the "nodelalloc" option (because several years ago, there was a discussion about "delalloc or not" from which I got the impression that nodelalloc was probably slightly safer - if the resulting performance reduction is not a problem, which it is not for me):
>      /dev/md0 on / type ext4 (rw,relatime,nodelalloc,errors=remount-ro)
> 
> Three examples follow below.  Note that the bad field values, when interpreted as 48-bit signed numbers, are numerically small negative numbers (-25, -9, -3, respectively).
> 
> Excerpts from the fsck logs:
> root: Inode 10748715, i_blocks is 281474976710631, should be 5. FIXED.
> root: Inode 10751288, i_blocks is 281474976710647, should be 3. FIXED.
> root: Inode 10748542, i_blocks is 281474976710653, should be 1. FIXED.
> 
> I don't know when the first two of these corruptions occurred, but the last one happened yesterday or the day before.  The file in question was /etc/fstab, and I discovered the problem after I had edited fstab on Wednesday and rebooted on Thursday.
> 
> The corrupted files can be read and copied without problems.  I have not dared to delete any of those files before fsck had fixed them.
> 
> What is going on here?

This looks like an underflow of the used blocks count on the inode:

    281474976710631 = 0xffffffffffe7
    281474976710647 = 0xfffffffffff7
    281474976710653 = 0xfffffffffffd

This is 2^48 blocks, which is the limit for the number of blocks that fit
into the available inode fields (32-bit i_blocks_lo, 16-bit i_blocks_hi).

There is likely some kind of accounting error in the code.  Is anything
unusual with access patterns for those files (large xattrs/ACLs, are they
files or directories or special files. mmap, truncate, fallocate, etc.)?

If you are able to reproduce with the /etc/fstab editing, possibly strace
could help to identify if something unusual is being done to the file.

Cheers, Andreas

Attachment:
signature.asc

Description: Message signed with OpenPGP