On Wed, 18 Sept 2024 at 16:12, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > That's actually OK. The first time around the loop, we haven't walked the > tree, so we start from the top as you'd expect. The only other reason to > go around the loop again is that memory allocation failed for a node, and > in that case we call xas_nomem() and that (effectively) calls xas_reset(). Well, that's quite subtle and undocumented. But yes, I see the (open-coded) xas_reset() in xas_nomem(). So yes, in practice it seems to be only the xas_split_alloc() path in there that can have this problem, but maybe this should at the very least be very documented. The fact that this bug was fixed basically entirely by mistake does say "this is much too subtle". Of course, the fact that an xas_reset() not only resets the walk, but also clears any pending errors (because it's all the same "xa_node" thing), doesn't make things more obvious. Because right now you *could* treat errors as "cumulative", but if a xas_split_alloc() does an xas_reset() on success, that means that it's actually a big conceptual change and you can't do the "cumulative" thing any more. End result: it would probably make sense to change "cas_split_alloc()" to explicitly *not* have that "check xas_error() afterwards as if it could be cumulative", and instead make it very clearly have no history and change the semantics to (a) return the error - instead of having people have to check for errors separately afterwards (b) do the xas_reset() in the success path so that it explicitly does *not* work for accumulating previous errors (which presumably was never really the intent of the interface, but people certainly _could_ use it that way). Linus