On 2019/08/06 9:25, Dave Chinner wrote: > On Tue, Aug 06, 2019 at 12:05:51AM +0000, Damien Le Moal wrote: >> On 2019/08/06 7:05, Damien Le Moal wrote: >>> On 2019/08/06 6:59, Damien Le Moal wrote: >>>> On 2019/08/06 6:28, Jens Axboe wrote: >>>>> On 8/5/19 2:27 PM, Damien Le Moal wrote: >>>>>> On 2019/08/06 6:26, Jens Axboe wrote: >>>>>>>> In any case, looking again at this code, it looks like there is a >>>>>>>> problem with dio->size being incremented early, even for fragments >>>>>>>> that get BLK_QC_T_EAGAIN, because dio->size is being used in >>>>>>>> blkdev_bio_end_io(). So an incorrect size can be reported to user >>>>>>>> space in that case on completion (e.g. large asynchronous no-wait dio >>>>>>>> that cannot be issued in one go). >>>>>>>> >>>>>>>> So maybe something like this ? (completely untested) >>>>>>> >>>>>>> I think that looks pretty good, I like not double accounting with >>>>>>> this_size and dio->size, and we retain the old style ordering for the >>>>>>> ret value. >>>>>> >>>>>> Do you want a proper patch with real testing backup ? I can send that >>>>>> later today. >>>>> >>>>> Yeah that'd be great, I like your approach better. >>>>> >>>> >>>> Looking again, I think this is not it yet: dio->size is being referenced after >>>> submit_bio(), so blkdev_bio_end_io() may see the old value if the bio completes >>>> before dio->size increment. So the use-after-free is still there. And since >>>> blkdev_bio_end_io() processes completion to user space only when dio->ref >>>> becomes 0, adding an atomic_inc/dec(&dio->ref) over the loop would not help and >>>> does not cover the single BIO case. Any idea how to address this one ? >>>> >>> >>> May be add a bio_get/put() over the 2 places that do submit_bio() would work, >>> for all cases (single/multi BIO, sync & async). E.g.: >>> >>> + bio_get(bio); >>> qc = submit_bio(bio); >>> if (qc == BLK_QC_T_EAGAIN) { >>> if (!dio->size) >>> ret = -EAGAIN; >>> + bio_put(bio); >>> goto error; >>> } >>> dio->size += bio_size; >>> + bio_put(bio); >>> >>> Thoughts ? >>> >> >> That does not work since the reference to dio->size in blkdev_bio_end_io() >> depends on atomic_dec_and_test(&dio->ref) which counts the BIO fragments for the >> dio (+1 for async multi-bio case). So completion of the last bio can still >> reference the old value of dio->size. > > Didn't we fix this same use-after-free in iomap_dio_rw() in commit > 4ea899ead278 ("iomap: fix a use after free in iomap_dio_rw")? Not so similar in this case with raw block device dio. But I will look more into it for inspiration. Thank you for the pointer. > > Cheers, > > Dave. > -- Damien Le Moal Western Digital Research