Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)

Chris Mason <clm@xxxxxxxx> · Wed, 18 Sep 2024 18:37:16 +0200

On 9/18/24 10:12 AM, Matthew Wilcox wrote:
> On Wed, Sep 18, 2024 at 03:51:39PM +0200, Linus Torvalds wrote:
>> On Wed, 18 Sept 2024 at 15:35, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>>>
>>> Oh god, that's it.
>>>
>>> there should have been an xas_reset() after calling xas_split_alloc().
>>
>> I think it is worse than that.
>>
>> Even *without* an xas_split_alloc(), I think the old code was wrong,
>> because it drops the xas lock without doing the xas_reset.
> 
> That's actually OK.  The first time around the loop, we haven't walked the
> tree, so we start from the top as you'd expect.  The only other reason to
> go around the loop again is that memory allocation failed for a node, and
> in that case we call xas_nomem() and that (effectively) calls xas_reset().
> 
> So in terms of the expected API for xa_state users, it would be consistent
> for xas_split_alloc() to call xas_reset().
> 
> You might argue that this API is too subtle, but it was intended to
> be easy to use.  The problem was that xas_split_alloc() got added much
> later and I forgot to maintain the invariant that makes it work as well
> as be easy to use.
> 

Ok, missing xas_reset() makes a ton of sense as the root cause, and it
also explains why tmpfs hasn't seen the problem.

We'll start validating 6.11 and make noise if the large folios cause
problems again.  Thanks everyone!

-chris