On Tue, 2023-07-18 at 11:03 -0700, Hugh Dickins wrote: > On Tue, 18 Jul 2023, Jeff Layton wrote: > > On Mon, 2023-07-17 at 20:43 -0700, Hugh Dickins wrote: > > > Hi Jeff, > > > > > > I've been unable to run my kernel builds on ext4 on loop0 on tmpfs > > > swapping load on linux-next recently, on one machine: various kinds > > > of havoc, most common symptoms being ext4_find_dest_de:2107 errors, > > > systemd-journald errors, segfaults. But no problem observed running > > > on a more recent installation. > > > > > > Bisected yesterday to 979492850abd ("ext4: convert to ctime accessor > > > functions"). > > > > > > I've mostly averted my eyes from the EXT4_INODE macro changes there, > > > but I think that's where the problem lies. Reading the comment in > > > fs/ext4/ext4.h above EXT4_FITS_IN_INODE() led me to try "tune2fs -l" > > > and look at /etc/mke2fs.conf. It's an old installation, its own > > > inodes are 256, but that old mke2fs.conf does default to 128 for small > > > FSes, and what I use for the load test is small. Passing -I 256 to the > > > mkfs makes the problems go away. > > > > > > (What's most alarming about the corruption is that it appears to extend > > > beyond just the throwaway test filesystem: segfaults on bash and libc.so > > > from the root filesystem. But no permanent damage done there.) > > > > > > One oddity I noticed in scrutinizing that commit, didn't help with > > > the issues above, but there's a hunk in ext4_rename() which changes > > > - old.dir->i_ctime = old.dir->i_mtime = current_time(old.dir); > > > + old.dir->i_mtime = inode_set_ctime_current(old.inode); > > > > > > > > > > I suspect the problem here is the i_crtime, which lives wholly in the > > extended part of the inode. The old macros would just not store anything > > if the i_crtime didn't fit, but the new ones would still store the > > tv_sec field in that case, which could be a memory corruptor. This patch > > should fix it, and I'm testing it now. > > That makes sense. > > > > > Hugh, if you're able to give this a spin on your setup, then that would > > be most helpful. This is also in the "ctime" branch in my kernel.org > > tree if that helps. If this looks good, I'll ask Christian to fold this > > into the ext4 conversion patch. > > Yes, it's now running fine on the problem machine, and on the no-problem. > > Tested-by: Hugh Dickins <hughd@xxxxxxxxxx> > > > > > Thanks for the bug report! > > And thanks for the quick turnaround! > > But I'm puzzled by your dismissing that > - old.dir->i_ctime = old.dir->i_mtime = current_time(old.dir); > + old.dir->i_mtime = inode_set_ctime_current(old.inode); > in ext4_rename() as "actually looks fine". > > Different issue, nothing to do with the corruption, sure. Much less > important, sure. But updating ctime on the wrong inode is "fine"? Ahh , sorry I wasn't looking at that properly. I think you're correct. The right fix is probably to move ext4 to use generic_rename_timestamp. I'll test and send another patch for that. Thanks again! -- Jeff Layton <jlayton@xxxxxxxxxx>