On Sun, Nov 22, 2020 at 06:38:28PM +0000, Nick Alcock wrote: > So I just tried to reboot my x86 server box from 5.9.6 to 5.9.10 and my Sorry about that, there was a bad patch in -rc4 that got sucked into 5.9.9 because it had a fixes tag. The revert is already upstream: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?id=eb8409071a1d47e3593cfe077107ac46853182ab --D > system oopsed with an xfs fs corruption message when I kicked up > Chromium on another machine which mounted $HOME from the server box (it > panicked without logging anything, because the corruption was detected > on the rootfs, and it is also the loghost). A subsequent reboot died > instantly as soon as it tried to mount root, but the next one got all > the way to starting Chromium before dying again the same way. > > Rebooting back into 5.9.6 causes everything to work fine again, no > reports of corruption and starting Chromium works. > > This fs has rmapbt and reflinks enabled, on a filesystem originally > created by xfsprogs 4.10.0, but I have never knowingly used them under > the Chromium config dirs (or, actually, under that user's $HOME at all). > I've used them extensively elsewhere on the fs though. The FS is sitting > above a libata -> md-raid6 -> bcache stack. (It is barely possible that > bcache is at fault, but bcache has seen no changes since 5.9.6 so I > doubt it.) > > The relevant bits of the log I could capture -- no console scrollback > these days, of course :( and it was a panic anyway so the top is just > lost -- is in a photo here: > > <http://www.esperi.org.uk/~nix/temporary/xfs-crash.jpg> > > The mkfs line used to create this fs was: > > mkfs.xfs -m rmapbt=1,reflink=1 -d agcount=17,sunit=$((128*8)),swidth=$((384*8)) -l logdev=/dev/sde3,size=521728b -i sparse=1,maxpct=25 /dev/main/root > > (/dev/sde3 is an SSD which also hosts the bcache and RAID journal, > though this RAID device is not journalled, and is operating fine.) > > I am not using a realtime device. > > I have *not* yet run xfs_repair, but just rebooted back into the old > kernel, since everything worked there: I'll run xfs_repair over the fs > if you think it wise to do so, but right now I have a state which > crashes on one kernel and works on another one, which seems useful to > not try to fix in case you have some use for it. > > Since everything is working fine in 5.9.6 and there were XFS changes > after that, I'm hypothesising that this is probably a bug in the > post-5.9.6 changes rather than anything xfs_repair should be trying to > fix. But I really don't know :) > > (I can't help but notice that all these post-5.9.6 XFS changes were > sucked in by Sasha's magic regression-hunting stable-tree AI, which I > thought wasn't meant to happen -- but I've not been watching closely, > and if you changed your minds after the LWN article went in I won't have > seen it.)