Re: [PATCH 1/9] QUEUE_FLAG_NOWAIT to indicate device supports nowait

Goldwyn Rodrigues <rgoldwyn@xxxxxxx> · Thu, 10 Aug 2017 06:49:53 -0500

On 08/09/2017 09:17 PM, Jens Axboe wrote:
> On 08/09/2017 08:07 PM, Goldwyn Rodrigues wrote:
>>>>>>>> No, from a multi-device point of view, this is inconsistent. I
>>>>>>>> have tried the request bio returns -EAGAIN before the split, but
>>>>>>>> I shall check again. Where do you see this happening?
>>>>>>>
>>>>>>> No, this isn't multi-device specific, any driver can do it.
>>>>>>> Please see blk_queue_split.
>>>>>>>
>>>>>>
>>>>>> In that case, the bio end_io function is chained and the bio of
>>>>>> the split will replicate the error to the parent (if not already
>>>>>> set).
>>>>>
>>>>> this doesn't answer my question. So if a bio returns -EAGAIN, part
>>>>> of the bio probably already dispatched to disk (if the bio is
>>>>> splitted to 2 bios, one returns -EAGAIN, the other one doesn't
>>>>> block and dispatch to disk), what will application be going to do?
>>>>> I think this is different to other IO errors. FOr other IO errors,
>>>>> application will handle the error, while we ask app to retry the
>>>>> whole bio here and app doesn't know part of bio is already written
>>>>> to disk.
>>>>
>>>> It is the same as for other I/O errors as well, such as EIO. You do
>>>> not know which bio of all submitted bio's returned the error EIO.
>>>> The application would and should consider the whole I/O as failed.
>>>>
>>>> The user application does not know of bios, or how it is going to be
>>>> split in the underlying layers. It knows at the system call level.
>>>> In this case, the EAGAIN will be returned to the user for the whole
>>>> I/O not as a part of the I/O. It is up to application to try the I/O
>>>> again with or without RWF_NOWAIT set. In direct I/O, it is bubbled
>>>> out using dio->io_error. You can read about it at the patch header
>>>> for the initial patchset at [1].
>>>>
>>>> Use case: It is for applications having two threads, a compute
>>>> thread and an I/O thread. It would try to push AIO as much as
>>>> possible in the compute thread using RWF_NOWAIT, and if it fails,
>>>> would pass it on to I/O thread which would perform without
>>>> RWF_NOWAIT. End result if done right is you save on context switches
>>>> and all the synchronization/messaging machinery to perform I/O.
>>>>
>>>> [1] http://marc.info/?l=linux-block&m=149789003305876&w=2
>>>
>>> Yes, I knew the concept, but I didn't see previous patches mentioned
>>> the -EAGAIN actually should be taken as a real IO error. This means a
>>> lot to applications and make the API hard to use. I'm wondering if we
>>> should disable bio split for NOWAIT bio, which will make the -EAGAIN
>>> only mean 'try again'.
>>
>> Don't take it as EAGAIN, but read it as EWOULDBLOCK. Why do you say
>> the API is hard to use? Do you have a case to back it up?
> 
> Because it is hard to use, and potentially suboptimal. Let's say you're
> doing a 1MB write, we hit EWOULDBLOCK for the last split. Do we return a
> short write, or do we return EWOULDBLOCK? If the latter, then that
> really sucks from an API point of view.
> 
>> No, not splitting the bio does not make sense here. I do not see any
>> advantage in it, unless you can present a case otherwise.
> 
> It ties back into the "hard to use" that I do agree with IFF we don't
> return the short write. It's hard for an application to use that
> efficiently, if we write 1MB-128K but get EWOULDBLOCK, the re-write the
> full 1MB from a different context.
> 

It returns the error code only and not short reads/writes. But isn't
that true for all system calls in case of error?

For aio, there are two result fields in io_event out of which one could
be used for error while the other be used for amount of writes/reads
performed. However, only one is used. This will not work with
pread()/pwrite() calls though because of the limitation of return values.

Finally, what if the EWOULDBLOCK is returned for an earlier bio (say
offset 128k) for a 1MB pwrite(), while the rest of the 7 128K are
successful. What short return value should the system call return?

-- 
Goldwyn

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel