Re: [PATCH] block: support zone append bvecs

Damien Le Moal <Damien.LeMoal@xxxxxxx> · Wed, 24 Mar 2021 10:22:36 +0000

On 2021/03/24 19:09, Johannes Thumshirn wrote:
> On 24/03/2021 08:13, Johannes Thumshirn wrote:
>> On 24/03/2021 08:11, Christoph Hellwig wrote:
>>> On Wed, Mar 24, 2021 at 07:07:27AM +0000, Johannes Thumshirn wrote:
>>>>>>  	if (iov_iter_is_bvec(iter)) {
>>>>>> -		if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND))
>>>>>> -			return -EINVAL;
>>>>>> +		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
>>>>>> +			struct request_queue *q = bio->bi_bdev->bd_disk->queue;
>>>>>> +			unsigned int max_append =
>>>>>> +				queue_max_zone_append_sectors(q) << 9;
>>>>>> +
>>>>>> +			if (WARN_ON_ONCE(iter->count > max_append))
>>>>>> +				return -EINVAL;
>>>>>> +		}
>>>>>
>>>>> That is not correct.  bio_iov_iter_get_pages just fills the bio as far
>>>>> as we can, and then returns 0 for the next call to continue.  Basically
>>>>> what you want here is a partial version of bio_iov_bvec_set.
>>>>>
>>>>
>>>> Isn't that what I did? The above is checking if we have REQ_OP_ZONE_APPEND and
>>>> then returns EINVAL if iter->count is bigger than queue_max_zone_append_sectors().
>>>> If the check doesn't fail, its going to call bio_iov_bvec_set().
>>>
>>> And that is the problem.  It should not fail, the payload is decoupled
>>> from the max_append size.
>>>
>>> Doing the proper thing is not too hard as described above - make sure
>>> the bi_iter points to only the chunk of iter passed in that fits, and
>>> only advance the passed in iter by that amount.
>>>
>>
>> Ah got it now,
>> thanks
>>
> 
> Stupid question, but wouldn't it be sufficient if I did (this can still be
> simplified):
> 
> diff --git a/block/bio.c b/block/bio.c
> index 26b7f721cda8..9c529b2db8fa 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -964,6 +964,16 @@ static int bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
>         return 0;
>  }
>  
> +static int bio_iov_append_bvec_set(struct bio *bio, struct iov_iter *iter)
> +{
> +       struct request_queue *q = bio->bi_bdev->bd_disk->queue;
> +       unsigned int max_append = queue_max_zone_append_sectors(q) << 9;
> +
> +       iov_iter_truncate(iter, max_append);

Why truncate ? the caller of bio_iov_iter_get_pages() will see a shorter iter
after this return and will not loop over to send the remaining.

I think you need a special bio_iov_bvec_set() or you need to patch it so that
the bio size is at most max_append instead of iter->count. Same for the segments
count: the bio must be set with the number of segments up to max_append instead
of iter->nr_segs.

Then you can do:

iov_iter_advance(iter, max_append);

before returning, like bio_iov_bvec_set() does. No ?

> +
> +       return bio_iov_bvec_set(bio, iter);
> +}
> +
>  #define PAGE_PTRS_PER_BVEC     (sizeof(struct bio_vec) / sizeof(struct page *))
>  
>  /**
> @@ -1094,8 +1104,8 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>         int ret = 0;
>  
>         if (iov_iter_is_bvec(iter)) {
> -               if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND))
> -                       return -EINVAL;
> +               if (bio_op(bio) == REQ_OP_ZONE_APPEND)
> +                       return bio_iov_append_bvec_set(bio, iter);
>                 return bio_iov_bvec_set(bio, iter);
>         }
> 
> The above had a successful xfstests run and several fio --ioengine=splice runs.
> 

-- 
Damien Le Moal
Western Digital Research