Re: [PATCH 1/9] QUEUE_FLAG_NOWAIT to indicate device supports nowait

Goldwyn Rodrigues <rgoldwyn@xxxxxxx> · Wed, 9 Aug 2017 17:16:23 -0500

On 08/09/2017 03:21 PM, Shaohua Li wrote:
> On Wed, Aug 09, 2017 at 10:35:39AM -0500, Goldwyn Rodrigues wrote:
>>
>>
>> On 08/09/2017 10:02 AM, Shaohua Li wrote:
>>> On Wed, Aug 09, 2017 at 06:44:55AM -0500, Goldwyn Rodrigues wrote:
>>>>
>>>>
>>>> On 08/08/2017 03:32 PM, Shaohua Li wrote:
>>>>> On Wed, Jul 26, 2017 at 06:57:58PM -0500, Goldwyn Rodrigues wrote:
>>>>>> From: Goldwyn Rodrigues <rgoldwyn@xxxxxxxx>
>>>>>>
>>>>>> Nowait is a feature of direct AIO, where users can request
>>>>>> to return immediately if the I/O is going to block. This translates
>>>>>> to REQ_NOWAIT in bio.bi_opf flags. While request based devices
>>>>>> don't wait, stacked devices such as md/dm will.
>>>>>>
>>>>>> In order to explicitly mark stacked devices as supported, we
>>>>>> set the QUEUE_FLAG_NOWAIT in the queue_flags and return -EAGAIN
>>>>>> whenever the device would block.
>>>>>
>>>>> probably you should route this patch to Jens first, DM/MD are different trees.
>>>>
>>>> Yes, I have sent it to linux-block as well, and he has commented as well.
>>>>
>>>>
>>>>>  
>>>>>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@xxxxxxxx>
>>>>>> ---
>>>>>>  block/blk-core.c       | 3 ++-
>>>>>>  include/linux/blkdev.h | 2 ++
>>>>>>  2 files changed, 4 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>>>> index 970b9c9638c5..1c9a981d88e5 100644
>>>>>> --- a/block/blk-core.c
>>>>>> +++ b/block/blk-core.c
>>>>>> @@ -2025,7 +2025,8 @@ generic_make_request_checks(struct bio *bio)
>>>>>>  	 * if queue is not a request based queue.
>>>>>>  	 */
>>>>>>  
>>>>>> -	if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q))
>>>>>> +	if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q) &&
>>>>>> +	    !blk_queue_supports_nowait(q))
>>>>>>  		goto not_supported;
>>>>>>  
>>>>>>  	part = bio->bi_bdev->bd_part;
>>>>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>>>>>> index 25f6a0cb27d3..fae021ebec1b 100644
>>>>>> --- a/include/linux/blkdev.h
>>>>>> +++ b/include/linux/blkdev.h
>>>>>> @@ -633,6 +633,7 @@ struct request_queue {
>>>>>>  #define QUEUE_FLAG_REGISTERED  29	/* queue has been registered to a disk */
>>>>>>  #define QUEUE_FLAG_SCSI_PASSTHROUGH 30	/* queue supports SCSI commands */
>>>>>>  #define QUEUE_FLAG_QUIESCED    31	/* queue has been quiesced */
>>>>>> +#define QUEUE_FLAG_NOWAIT      32	/* stack device driver supports REQ_NOWAIT */
>>>>>>  
>>>>>>  #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
>>>>>>  				 (1 << QUEUE_FLAG_STACKABLE)	|	\
>>>>>> @@ -732,6 +733,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
>>>>>>  #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
>>>>>>  #define blk_queue_scsi_passthrough(q)	\
>>>>>>  	test_bit(QUEUE_FLAG_SCSI_PASSTHROUGH, &(q)->queue_flags)
>>>>>> +#define blk_queue_supports_nowait(q)	test_bit(QUEUE_FLAG_NOWAIT, &(q)->queue_flags)
>>>>>
>>>>> Should this bit consider under layer disks? For example, one raid array disk
>>>>> doesn't support NOWAIT, shouldn't we disable NOWAIT for the array?
>>>>
>>>> Yes, it should. I will add a check before setting the flag. Thanks.
>>>> Request-based devices don't wait. So, they would not have this flag set.
>>>> It is only the bio-based, with the  make_request_fn hook which need this.
>>>>
>>>>>  
>>>>> I have another generic question. If a bio is splitted into 2 bios, one bio
>>>>> doesn't need to wait but the other need to wait. We will return -EAGAIN for the
>>>>> second bio, so the whole bio will return -EAGAIN, but the first bio is already
>>>>> dispatched to disk. Is this correct behavior?
>>>>>
>>>>
>>>> No, from a multi-device point of view, this is inconsistent. I have
>>>> tried the request bio returns -EAGAIN before the split, but I shall
>>>> check again. Where do you see this happening?
>>>
>>> No, this isn't multi-device specific, any driver can do it. Please see blk_queue_split.
>>>
>>
>> In that case, the bio end_io function is chained and the bio of the
>> split will replicate the error to the parent (if not already set).
> 
> this doesn't answer my question. So if a bio returns -EAGAIN, part of the bio
> probably already dispatched to disk (if the bio is splitted to 2 bios, one
> returns -EAGAIN, the other one doesn't block and dispatch to disk), what will
> application be going to do? I think this is different to other IO errors. FOr
> other IO errors, application will handle the error, while we ask app to retry
> the whole bio here and app doesn't know part of bio is already written to disk.

It is the same as for other I/O errors as well, such as EIO. You do not
know which bio of all submitted bio's returned the error EIO. The
application would and should consider the whole I/O as failed.

The user application does not know of bios, or how it is going to be
split in the underlying layers. It knows at the system call level. In
this case, the EAGAIN will be returned to the user for the whole I/O not
as a part of the I/O. It is up to application to try the I/O again with
or without RWF_NOWAIT set. In direct I/O, it is bubbled out using
dio->io_error. You can read about it at the patch header for the initial
patchset at [1].

Use case: It is for applications having two threads, a compute thread
and an I/O thread. It would try to push AIO as much as possible in the
compute thread using RWF_NOWAIT, and if it fails, would pass it on to
I/O thread which would perform without RWF_NOWAIT. End result if done
right is you save on context switches and all the
synchronization/messaging machinery to perform I/O.

[1] http://marc.info/?l=linux-block&m=149789003305876&w=2

-- 
Goldwyn