Re: [PATCH] block: deny zone management ioctl on mounted fs

Damien Le Moal <Damien.LeMoal@xxxxxxx> · Fri, 15 May 2020 05:25:14 +0000

On 2020/05/15 14:09, Coly Li wrote:
> On 2020/5/15 12:52, Damien Le Moal wrote:
>> On 2020/05/15 1:26, Johannes Thumshirn wrote:
>>> If a user submits a zone management ioctl from user-space, like a zone
>>> reset and a file-system (like zonefs or f2fs) is mounted on the zoned
>>> block device, the zone will get reset and the file-system's cached value
>>> of the zone's write-pointer becomes invalid.
>>>
>>> Subsequent writes to this zone from the file-system will result in
>>> unaligned writes and the drive will error out.
>>>
>>> Deny zone management ioctls when a super_block is found on the block
>>> device.
>>
>> Zone management ioctls can only be executed by users that have SYS_CAP_ADMIN
>> capabilities. If these start doing stupid things, the system is probably in for
>> a lot of troubles beyond what this patch is trying to prevent.
>>
>> In addition, there are so many other ways that a mounted zoned block device can
>> be corrupted beyond these ioctls that I am not sure this patch is very useful.
>> E.g. any raw block device write in a zone would also cause the FS to see
>> unaligned writes, and any other raw block device access is also possible for
>> SYS_CAP_ADMIN users. Preventing only these ioctls does not really improve
>> anything I think. That may even be harmful has that would prevent things like
>> inline file system check utilities to run.
>>
>>
> 
> The problem I encountered was, after I write 8KB data into seq/0 file, I
> want to re-write from offset 0. At that moment I didn't know to use
> 'truncate -s 0' to reset the write pointer of this zone file, so I use
> 'blkzone reset' to reset the write pointer of seq zone 0, and I saw the
> write pointer was reset to 0. But I still was not able to write data
> into seq/0 file on offset 0. Then I decided to reset all the zones by
> command 'blkzone reset -o 0 -c <zones number>', then the command hung
> for 20+ minutes and not response. From the kernel message I saw quite a
> log error message (an example is on pastbin: https://pastebin.com/ZFFNsaE0)
> 
> In my mind, there are 2 methods to reset a zone, one is from blkzone,
> one is from truncate on zonefs. I guess I am not the first/last one
> which thinks the two method should work both, and has no idea when the
> above error encountered.

Well yes, that is correct. These are methods to reset zones. But for a mounted
disk, any raw block device operation can corrupt the file system on it. That is
a principle that remains true for zoned block devices. Resenting a zone directly
on the device without the FS being aware of the operation will (and does)
corrupt the FS. Same for raw disk writes vs file writes on any mounted disk...

> 
> Reject blkzone reset command when the zoned SMR drive is mounted by
> zonefs, it is OK to me to avoid confusion and further mistake. IMHO,
> This is a solution at least.

libblkid now includes patches supporting zonefs detection, so yes, we can patch
blkzone to reject zone management operations if the device is mounted. We need
the same for f2fs and dm-zoned too. Time to clean that up. Will do.

> 
> Thanks.
> 
> Coly Li
> 
>>>
>>> Reported-by: Coly Li <colyli@xxxxxxx>
>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@xxxxxxx>
>>> ---
>>>
>>> Is there a better way to check for a mounted FS than get_super()/drop_super()?
>>>
>>>  block/blk-zoned.c | 7 +++++++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
>>> index 23831fa8701d..6923695ec414 100644
>>> --- a/block/blk-zoned.c
>>> +++ b/block/blk-zoned.c
>>> @@ -325,6 +325,7 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode,
>>>  			   unsigned int cmd, unsigned long arg)
>>>  {
>>>  	void __user *argp = (void __user *)arg;
>>> +	struct super_block *sb;
>>>  	struct request_queue *q;
>>>  	struct blk_zone_range zrange;
>>>  	enum req_opf op;
>>> @@ -345,6 +346,12 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode,
>>>  	if (!(mode & FMODE_WRITE))
>>>  		return -EBADF;
>>>  
>>> +	sb = get_super(bdev);
>>> +	if (sb) {
>>> +		drop_super(sb);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>>  	if (copy_from_user(&zrange, argp, sizeof(struct blk_zone_range)))
>>>  		return -EFAULT;
>>>  
>>>
>>
>>
> 
> 

-- 
Damien Le Moal
Western Digital Research