Re: [PATCH v2 3/4] block: Prevent potential deadlocks in zone write plug error recovery

Damien Le Moal <dlemoal@xxxxxxxxxx> · Mon, 9 Dec 2024 17:18:00 +0900

On 12/9/24 16:57, Christoph Hellwig wrote:
> On Mon, Dec 09, 2024 at 07:57:57AM +0900, Damien Le Moal wrote:
>> Avoid this problem by completely getting rid of the need for executing a
>> report zone from within the zone write plugging code, instead relying on
>> the user either executing a report zones, resetting the zone or
>> finishing the zone of a failed write. This is not an unresannable
> 
> s/unresannable/unreasonable/ ?

yes.

> 
>> requirement as all well-behaved applications, FSes and device mapper
>> already use report zones to recover from write errors whenever possible.
> 
> I think the real question here is what errors the file system (or other
> submitter) can even recover from.  The next patch deals with the not
> support case for a "special" operation, and that's of course a valid one.

Yep. But even that one is actually coded in scsi to return a -EIO instead of
ENOTSUPP. We can patch that (return ENOTSUPP for an invalid opcode error), but I
am not sure if that is safe to do given that this has been like this for ages.

This is all to say that we cannot even reliably distinguish special/valid error
cases that can be recovered from actual medium/hard errors.

> The first patch already excludes EAGAIN from nowait, and the drivers
> already retry anything that they think is retryable by just resubmitting
> without bubbling it up to the submitter.  That mostly leaves fatal
> media errors as all modern hardware that supports zones just remaps
> on write media failures.  I.e. for those the most sane answer is to
> simply shut down the file system for single-device file systems, or
> treat the device as faulty for multi-device file systems.  This might
> change when we support logical depop on a per-zone basis, but I don't
> think anyone is there yet.  We also really should test this case.
> I'll add a testcase with error injection for zoned xfs, and someone
> should do the same for btrfs (including multi-device handling) and
> f2fs.

I have test cases for zonefs already. That is because zonefs has the
"recover-error" mount option which forces a recovery of a file size (== write
pointer position) if a write fails or is torn. The default even for zonefs is to
go read-only since there is indeed not much we can do about failed writes.

> Sorry for the long rant - not a comment on the code itself but maybe
> the commit log could use a little update.

OK. Will try to improve it.

> Also we probably need to recover this information somewhere in the
> docs.

Hmmm... not sure we have a good place for this. Let me figure out something.

-- 
Damien Le Moal
Western Digital Research