Re: [PATCH 2/2] zonefs: use zone-append for AIO as well

Johannes Thumshirn <Johannes.Thumshirn@xxxxxxx> · Wed, 22 Jul 2020 14:32:26 +0000

On 21/07/2020 15:02, Kanchan Joshi wrote:
> On Mon, Jul 20, 2020 at 10:21:18PM +0900, Johannes Thumshirn wrote:
>> If we get an async I/O iocb with an O_APPEND or RWF_APPEND flag set,
>> submit it using REQ_OP_ZONE_APPEND to the block layer.
>>
>> As an REQ_OP_ZONE_APPEND bio must not be split, this does come with an
>> additional constraint, namely the buffer submitted to zonefs must not be
>> bigger than the max zone append size of the underlying device. For
>> synchronous I/O we don't care about this constraint as we can return short
>> writes, for AIO we need to return an error on too big buffers.
> 
> I wonder what part of the patch implements that constraint on large
> buffer and avoids short-write.
> Existing code seems to trim iov_iter in the outset. 
> 
>         max = queue_max_zone_append_sectors(bdev_get_queue(bdev));
>         max = ALIGN_DOWN(max << SECTOR_SHIFT, inode->i_sb->s_blocksize);
>         iov_iter_truncate(from, max);

This actually needs a 'if (sync)' before the iov_iter_truncate() you're right.

> 
> This will prevent large-buffer seeing that error, and will lead to partial write.
> 
>> On a successful completion, the position the data is written to is
>> returned via AIO's res2 field to the calling application.
>>
>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@xxxxxxx>
>> ---
>> fs/zonefs/super.c  | 143 +++++++++++++++++++++++++++++++++++++++------
>> fs/zonefs/zonefs.h |   3 +
>> 2 files changed, 128 insertions(+), 18 deletions(-)
>>
>> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
>> index 5832e9f69268..f155a658675b 100644
>> --- a/fs/zonefs/super.c
>> +++ b/fs/zonefs/super.c
>> @@ -24,6 +24,8 @@
>>
>> #include "zonefs.h"
>>
>> +static struct bio_set zonefs_dio_bio_set;
>> +
>> static inline int zonefs_zone_mgmt(struct zonefs_inode_info *zi,
>> 				   enum req_opf op)
>> {
>> @@ -700,16 +702,71 @@ static const struct iomap_dio_ops zonefs_write_dio_ops = {
>> 	.end_io			= zonefs_file_write_dio_end_io,
>> };
>>
>> +struct zonefs_dio {
>> +	struct kiocb		*iocb;
>> +	struct task_struct	*waiter;
>> +	int			error;
>> +	struct work_struct	work;
>> +	size_t			size;
>> +	u64			sector;
>> +	struct completion	completion;
>> +	struct bio		bio;
>> +};
> 
> How about this (will save 32 bytes) - 
> +struct zonefs_dio {
> +       struct kiocb            *iocb;
> +       struct task_struct      *waiter;
> +       int                     error;
> +	union {
> +       	struct work_struct      work;   //only for async IO
> +       	struct completion       completion; //only for sync IO
> +	};
> +       size_t                  size;
> +       u64                     sector;
> +       struct bio              bio;
> +};
> And dio->error field is not required.
> I see it being used at one place -
> +       ret = zonefs_file_write_dio_end_io(iocb, dio->size,
> +                                          dio->error, 0);
> Here error-code can be picked from dio->bio.
> 

Indeed, did that and also removed the unused completion from 
zonefs_dio and made a union of work and waiter.