Re: [PATCH 1/7] zonefs: Detect append writes at invalid locations

Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> · Thu, 12 Jan 2023 17:26:00 +0900

On 1/12/23 16:33, Johannes Thumshirn wrote:
> On 12.01.23 06:18, Damien Le Moal wrote:
>> On 1/12/23 07:08, Damien Le Moal wrote:
>>> On 1/11/23 21:23, Johannes Thumshirn wrote:
>>>> On 10.01.23 14:08, Damien Le Moal wrote:
>>>>> Using REQ_OP_ZONE_APPEND operations for synchronous writes to sequential
>>>>> files succeeds regardless of the zone write pointer position, as long as
>>>>> the target zone is not full. This means that if an external (buggy)
>>>>> application writes to the zone of a sequential file underneath the file
>>>>> system, subsequent file write() operation will succeed but the file size
>>>>> will not be correct and the file will contain invalid data written by
>>>>> another application.
>>>>>
>>>>> Modify zonefs_file_dio_append() to check the written sector of an append
>>>>> write (returned in bio->bi_iter.bi_sector) and return -EIO if there is a
>>>>> mismatch with the file zone wp offset field. This change triggers a call
>>>>> to zonefs_io_error() and a zone check. Modify zonefs_io_error_cb() to
>>>>> not expose the unexpected data after the current inode size when the
>>>>> errors=remount-ro mode is used. Other error modes are correctly handled
>>>>> already.
>>>>
>>>> This only happens on ZNS and null_blk, doesn't it? On SCSI the Zone Append
>>>> emulation should catch this error before.
>>>
>>> Yes. The zone append will fail with scsi sd emulation so this change is
>>> not useful in that case. But null_blk and ZNS drives will not fail the
>>> zone append, resulting in a bad file size without this check.
>>
>> Let me retract that... For scsi sd, the zone append emulation will do its
>> job if an application writes to a zone under the file system. Then zonefs
>> issuing a zone append will also succeed and we end up with the bad inode size.
>>
>> We would get a zone append failure in zonefs with scsi sd if the
>> corruption to the zone was done with a passthrough command as these will
>> not result in sd_zbc zone write pointer tracking doing its job.
>>
> 
> But then the error recovery procedures in sd_zbc should come into place.

Yes, for the passthrough case, because it ends up generating a failed
write, which zonefs catches and handles. All good in that case. It remains
that the patch is also necessary for scsi if the corruption happens with
something like dd, which will be seem by sd_zbc. So subsequent zone append
from zonefs will succeed, even though zonefs is not aware that the wp
changed...

> 
> Anyways for ZNS and null_blk this is an improvement:
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@xxxxxxx>

-- 
Damien Le Moal
Western Digital Research