Re: [PATCH 1/4] xfs: force all buffers to be written during btree bulk load

Christoph Hellwig <hch@xxxxxxxxxxxxx> · Mon, 27 Nov 2023 23:13:40 -0800

> > If my understanding above is correct this just papers over the bug
> > that a buffer that is marked stale and can be reused for something
> > else is left on a delwri list.
> 
> I'm not sure it's a bug for *most* code that encounters it.  Regular
> transactions don't directly use the delwri lists, so it will never see
> that at all. If the buffer gets rewritten and logged, then it'll just
> end up on the AIL's delwri buffer list again.

Where "regular transactions" means data I/O, yes.  All normal metadata
I/O eventually ends up on a delwri list.  And we want it on that delwri
list after actually updating the contents.

> Every time I revisit this patch I wonder if DELWRI_Q is misnamed -- does
> it mean "b_list is active"?  Or does it merely mean "another thread will
> write this buffer via b_list if nobody gets there first"?  I think it's
> the second, since it's easy enough to list_empty().

I think it is misnamed and not clearly defined, and we should probably
fix that.  Note that least the current list_empty() usage isn't quite
correct either without a lock held by the delwri list owner.  We at
least need a list_empty_careful for a racey but still save check.

Thinking out loud I wonder if we can just kill XBF_DELWRI_Q entirely.
Except for various asserts we mostly use it in reverse polarity, that is
to cancel delwri writeout for stale buffers.  How about just skipping
XBF_STALE buffers on the delwri list directly and not bother with
DELWRI_Q?  With that we can then support multiple delwri lists that
don't get into each others way using say an allocating xarray instead
of the invase list and avoid this whole mess.

Let me try to prototype this..

> It's only repair that requires this new behavior that all new buffers
> have to be written through the delwri list, and only because it actually
> /can/ cancel the transaction by finishing the autoreap EFIs.

Right now yes.  But I think the delwri behavior is a land mine, and
this just happens to be the first user to trigger it.  Edit: while
looking through the DELWRI_Q usage I noticed xfs_qm_flush_one, which
seems to deal with what is at least a somewhat related issue based
on the comments there.

> Way back when I first discovered this, my first impulse was to make
> xfs_buf_stale wait for DELWRI_Q to clear.  That IIRC didn't work because
> a transaction could hold an inode that the AIL will need to lock.  I
> think xfs_buf_find_lock would have the same problem.

Yes, that makes sense.  Would be great to document such details in the
commit message..