Re: Block device direct read EIO handling broken?

Damien Le Moal <Damien.LeMoal@xxxxxxx> · Mon, 5 Aug 2019 21:59:00 +0000

On 2019/08/06 6:28, Jens Axboe wrote:
> On 8/5/19 2:27 PM, Damien Le Moal wrote:
>> On 2019/08/06 6:26, Jens Axboe wrote:
>>>> In any case, looking again at this code, it looks like there is a
>>>> problem with dio->size being incremented early, even for fragments
>>>> that get BLK_QC_T_EAGAIN, because dio->size is being used in
>>>> blkdev_bio_end_io(). So an incorrect size can be reported to user
>>>> space in that case on completion (e.g. large asynchronous no-wait dio
>>>> that cannot be issued in one go).
>>>>
>>>> So maybe something like this ? (completely untested)
>>>
>>> I think that looks pretty good, I like not double accounting with
>>> this_size and dio->size, and we retain the old style ordering for the
>>> ret value.
>>
>> Do you want a proper patch with real testing backup ? I can send that
>> later today.
> 
> Yeah that'd be great, I like your approach better.
> 

Looking again, I think this is not it yet: dio->size is being referenced after
submit_bio(), so blkdev_bio_end_io() may see the old value if the bio completes
before dio->size increment. So the use-after-free is still there. And since
blkdev_bio_end_io() processes completion to user space only when dio->ref
becomes 0, adding an atomic_inc/dec(&dio->ref) over the loop would not help and
does not cover the single BIO case. Any idea how to address this one ?

-- 
Damien Le Moal
Western Digital Research