On Wed, 18 Sept 2024 at 15:35, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > Oh god, that's it. > > there should have been an xas_reset() after calling xas_split_alloc(). I think it is worse than that. Even *without* an xas_split_alloc(), I think the old code was wrong, because it drops the xas lock without doing the xas_reset. > i wonder if xas_split_alloc() should call xas_reset() to prevent this > from ever being a problem again? See above: I think the code in filemap_add_folio() was buggy entirely unrelated to the xas_split_alloc(), although it is probably *much* easier to trigger issues with it (ie the alloc will just make any races much bigger) But even when it doesn't do the alloc, it takes and drops the lock, and it's unclear how much xas state it just randomly re-uses over the lock drop. (Maybe none of the other operations end up mattering, but it does look very wrong). So I think it might be better to do the xas_reset() when you do the xas_lock_irq(), no? Isn't _that_ the a more logical point where "any old state is unreliable, now we need to reset the walk"? Linus