Re: XBF_DONE semantics

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 29 Nov 2023 19:21:30 +1100

On Wed, Nov 29, 2023 at 07:18:05AM +0100, Christoph Hellwig wrote:
> On Wed, Nov 29, 2023 at 08:08:37AM +1100, Dave Chinner wrote:
> > > But places that use buf_get and manually fill in data only use it in a
> > > few cases. 
> > 
> > Yes. the caller of buf_get always needs to set XBF_DONE if it is
> > initialising a new buffer ready for it to be written. It should be
> > done before the caller drops the buffer lock so that no other lookup
> > can see the buffer in the state of "contains valid data but does not
> > have XBF_DONE set".
> 
> That makes sense, but we do have a whole bunch of weird things going
> on as well:
> 
>  - xfs_buf_ioend_handle_error sets XBF_DONE when retrying or failing

Write path. It is expected that XBF_DONE is set prior to submitting
the write, so this should be a no-op. It's probably detritus from
the repeated factoring of the buf_ioend code over the years that
we've never looked deeply into.

>  - xfs_buf_ioend sets XBF_DONE on successful write completion as well

Same. It's really only needed on read IO, but I'm guessing that
somewhere along the line it was done for all IO and we've never
stopped doing that.

>  - xfs_buf_ioend_fail drops XBF_DONE for any I/O failure

That's actually correct.

We're about to toss the buffer because we can't write it back. The
very next function call is xfs_buf_stale(), which invalidates the
buffer and it's contents. It should not have XBF_DONE and XBF_STALE
set at the same time.

It may be worthwhile to move the clearing of XBF_DONE to be inside
xfs_buf_stale(), but I'd have to look at the rest of the code to see
if that's the right thing to do or not.

>  - xfs_do_force_shutdown sets XBF_DONE on the super block buffer on
>    a foced shutdown

No idea why that exists.  I'd have to go code spelunking to find out
what it was needed for.

>  - xfs_trans_get_buf_map sets XBF_DONE on a forced shutdown

That code just looks broken. we're in the middle of a transaction
doing a buffer read, we found a match and raced with a shutdown so
we stale the buffer, mark it donei, bump the recursion count and
return it without an error?

A quick spelunk through history indicates stale vs undelay-write vs
XFS_BUF_DONE was an inconsistent mess. We fixed all the stale vs
done nastiness in the xfs_trans_buf_read() path, but missed this
case in the get path.

As it is, I'd say the way shutdown racing is handled here is broken
and could be fixed by returning -EIO instead of the staled buffer.
The buffer is already linked into the transaction, so the
transaction cancel in response to the IO error will handle the
buffer appropriately....

> So there's definitively a bunch of weird things not fully in line
> with the straight forward answer.

No surprises there - this is a 30 year old code base. Nothing looks
particularly problematic, though.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx