Re: [PATCH] mm/page-writeback: Consolidate wb_thresh bumping logic into __wb_calc_thresh

Guenter Roeck <linux@xxxxxxxxxxxx> · Thu, 16 Jan 2025 08:05:59 -0800

On Thu, Jan 16, 2025 at 03:56:57PM +0100, Jan Kara wrote:
...
> > 
> > Interesting. Is there some endianness issue, by any chance ? I only see the problem
> > with sheb (big endian), not with sh (little endian). I'd suspect that it is an
> > emulation bug, but it is odd that the problem did not show up before.
> 
> So far I don't have a good explanation. Let me write down here the facts,
> maybe it will trigger the aha effect.
> 
> 1) Ext2 stores the metadata in little endian ordering. We observe the
> problem with the first directory entry in the folio. Both entry->rec_len
> (16-bit) and entry->inode (32-bit) appear to be seen in wrong endianity
> 
> 2) The function that fails is ext2_check_folio(). We kmap_local() the folio
> in ext2_get_folio(), then in ext2_check_folio() we do:
> 
> 	ext2_dirent *p;
> 
> 	p = (ext2_dirent *)(kaddr + 0);
> 	rec_len = ext2_rec_len_from_disk(p->rec_len);
> 	^^^ value 3072 == 0x0c00 seen here instead of correct 0x000c
> 	this value is invalid so we go to:
> 	ext2_error(sb, __func__, "bad entry in directory #%lu: : %s - "
>                         "offset=%llu, inode=%lu, rec_len=%d, name_len=%d",
>                         dir->i_ino, error, folio_pos(folio) + offs,
>                         (unsigned long) le32_to_cpu(p->inode),
>                         rec_len, p->name_len);
> 
> 	Here rec_len is printed so we see the wrong value. Also
> le32_to_cpu(p->inode) is printed which also shows number with swapped byte
> ordering (the message contains inode number 27393 == 0x00006b01 but the
> proper inode number is 363 == 0x0000016b). This actually releals more about
> the problem because only the two bytes were swapped in the inode number
> although we always treat it as 32-bit entity. So this would indeed point
> more at some architectural issue rather than a problem in the filesystem /
> MM.
> 
> Note that to get at this point in the boot we must have correctly
> byteswapped many other directory entries in the filesystem. So the problem
> must be triggered by some parallel activity happening in the system or
> something like that.
> 
> 3) The problem appears only with MTD storage, not with IDE/SATA on the same
> system + filesystem image. It it unclear how the storage influences the
> reproduction, rather than that it influences timing of events in the
> system.
> 
> 4) The problem reliably happens with "mm/page-writeback: Consolidate wb_thresh
> bumping logic into __wb_calc_thresh", not without it. All this patch does
> is that it possibly changes a limit at which processes dirtying pages in
> the page cache get throttled. Thus there are fairly limited opportunities
> for how it can cause damage (I've checked for possible UAF issues or memory
> corruption but I don't really see any such possibility there, it is just
> crunching numbers from the mm counters and takes decision based on the
> result). This change doesn't have direct on the directory ext2 code. The only
> thing it does is that it possibly changes code alignment of ext2 code if it
> gets linked afterwards into vmlinux image (provided ext2 is built in). Another
> possibility is again that it changes timing of events in the system due to
> differences in throttling of processes dirtying page cache.
> 
> So at this point I don't have a better explanation than blame the HW. What
> really tipped my conviction in this direction is the 16-bit byteswap in a
> 32-bit entity. Hence I guess I'll ask Andrew to put Jim's patch back into
> tree if you don't object.

I agree that this is most likely an emulation problem (it would not be the
first one), so please go ahead.

Thanks,
Guenter