On 9/18/24 10:32 PM, Linus Torvalds wrote: > On Thu, 19 Sept 2024 at 05:38, Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> I kicked off a quick run with this on 6.9 with my debug patch as well, >> and it still fails for me... I'll double check everything is sane. For >> reference, below is the 6.9 filemap patch. Confirmed with a few more runs, still hits, basically as quickly as it did before. So no real change observed with the added xas_reset(). > Ok, that's interesting. So it's *not* just about "that code didn't do > xas_reset() after xas_split_alloc()". > > Now, another thing that commit 6758c1128ceb ("mm/filemap: optimize > filemap folio adding") does is that it now *only* calls xa_get_order() > under the xa lock, and then it verifies it against the > xas_split_alloc() that it did earlier. > > The old code did "xas_split_alloc()" with one order (all outside the > lock), and then re-did the xas_get_order() lookup inside the lock. But > if it changed in between, it ended up doing the "xas_split()" with the > new order, even though "xas_split_alloc()" was done with the *old* > order. > > That seems dangerous, and maybe the lack of xas_reset() was never the > *major* issue? > > Willy? You know this code much better than I do. Maybe we should just > back-port 6758c1128ceb in its entirety. > > Regardless, I'd want to make sure that we really understand the root > cause. Because it certainly looks like *just* the lack of xas_reset() > wasn't it. Just for sanity's sake, I backported 6758c1128ceb (and the associated xarray xas_get_order() change) to 6.9 and kicked that off. -- Jens Axboe