On Tue, Sep 17, 2024 at 01:13:05PM +0200, Chris Mason wrote: > On 9/17/24 5:32 AM, Matthew Wilcox wrote: > > On Mon, Sep 16, 2024 at 10:47:10AM +0200, Chris Mason wrote: > >> I've got a bunch of assertions around incorrect folio->mapping and I'm > >> trying to bash on the ENOMEM for readahead case. There's a GFP_NOWARN > >> on those, and our systems do run pretty short on ram, so it feels right > >> at least. We'll see. > > > > I've been running with some variant of this patch the whole way across > > the Atlantic, and not hit any problems. But maybe with the right > > workload ...? > > > > There are two things being tested here. One is whether we have a > > cross-linked node (ie a node that's in two trees at the same time). > > The other is whether the slab allocator is giving us a node that already > > contains non-NULL entries. > > > > If you could throw this on top of your kernel, we might stand a chance > > of catching the problem sooner. If it is one of these problems and not > > something weirder. > > > > This fires in roughly 10 seconds for me on top of v6.11. Since array seems > to always be 1, I'm not sure if the assertion is right, but hopefully you > can trigger yourself. Whoops. $ git grep XA_RCU_FREE lib/xarray.c:#define XA_RCU_FREE ((struct xarray *)1) lib/xarray.c: node->array = XA_RCU_FREE; so you walked into a node which is currently being freed by RCU. Which isn't a problem, of course. I don't know why I do that; it doesn't seem like anyone tests it. The jetlag is seriously kicking in right now, so I'm going to refrain from saying anything more because it probably won't be coherent.