I'm however still seeing a use-after-free error in the request
completion path in nvme_unmap_data(). It happens only when testing with
large block sizes in fio, typically > 128k, e.g. bs=256k will always hit it.
This is the error:
DMA-API: nvme 0000:00:04.0: device driver tries to free DMA memory it
has not allocated [device address=0x6b6b6b6b6b6b6b6b] [size=1802201963
bytes]
and this warning occasionally:
WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_IDLE);
It seems like a request might be issued multiple times but I can't see
anything in io_uring code that would account for it.
Both of them indicate reuse, and I agree I don't think it's io_uring. It
really feels like an issue with nvme when a poll queue is shared, but I
haven't been able to pin point what it is yet.
The 128K is interesting, that would seem to indicate that it's related to
splitting of the IO (which would create > 1 IO per submitted IO).
Where does the split take place? I had suspected that it might be
related to the submit_bio() loop in __blkdev_direct_IO() but I don't
think I saw multiple submit_bio() calls or maybe I missed something.
See the path from blk_mq_make_request() -> __blk_queue_split() ->
blk_bio_segment_split(). The bio is built and submitted, then split if
it violates any size constraints. The splits are submitted through
generic_make_request(), so that might be why you didn't see multiple
submit_bio() calls.
I think the problem is in __blkdev_direct_IO() and not related to
request size:
qc = submit_bio(bio);
if (polled)
WRITE_ONCE(iocb->ki_cookie, qc);
The first call to submit_bio() when dio->is_sync is not set won't have
acquired a bio ref through bio_get() and so the bio/dio could be freed
when ki_cookie is set.
With the specific io_uring test, this happens because
blk_mq_make_request()->blk_mq_get_request() fails and so terminates the
request.
As for the fix for polled io (!is_sync) case, I'm wondering if
dio->multi_bio is really necessary in __blkdev_direct_IO(). Can we call
bio_get() unconditionally after the call to bio_alloc_bioset(), set
dio->ref = 1, and increment it for additional submit bio calls? Would
it make sense to do away with multi_bio?
It's not ideal, but not sure I see a better way to fix it. You see the
case on failure, which we could check for (don't write cookie if it's
invalid). But this won't fix the case where the IO complete fast, or
even immediately.
Hence I think you're right, there's really no way around doing the bio
ref counting, even for the sync case. Care to cook up a patch we can
take a look at? I can run some high performance sync testing too, so we
can see how badly it might hurt.
Sure, I'll take a stab at it.
Thanks!
I sent it out. When I tested with next-20200114, the fio test ran ok
for sync/async with 4k. The sync test ran ok with 256k as well but I
still hit the original use-after-free bug with 256k.
With next-20200130 however, I'm hitting the use-after-free bug even with
4k so it is not a size related issue.
I wasn't sure how to force a multi-bio case so that hasn't been tested.
Also, a question about below code in io_complete_rw_iopoll()
if (res != req->result)
req_set_fail_links(req);
req->result could be set to the size of the completed io request, is the
check ok in that case?
Also, I'm not clear on how is_sync + mult_bio case is supposed to work.
__blkdev_direct_IO() polls for *a* completion in the request's hctx and
not *the* request completion itself, so what does that tell us for
multi_bio + is_sync? Is the polling supposed to guarantee that all
constituent bios for a mult_bio request have completed before return?
The polling really just ignores that, it doesn't take multi requests
into account. We just poll for the first part of it.
In a multi-bio case, I think it would poll for the last part of it, I
haven't changed that. I did add a check for a valid cookie since I
think it would loop forever in that case.