On 2/3/2020 12:34 AM, Christoph Hellwig wrote:
On Fri, Jan 31, 2020 at 06:08:16PM +0000, Bijan Mottahedeh wrote:
I think the problem is that in the async case, bio_get() is not called for
the initial bio *before* the submit_bio() call for that bio:
if (dio->is_sync) {
dio->waiter = current;
bio_get(bio);
} else {
dio->iocb = iocb;
}
The bio_get() call for the async case happens too late, after the
submit_bio() call:
if (!dio->multi_bio) {
/*
* AIO needs an extra reference to ensure the dio
* structure which is embedded into the first bio
* stays around.
*/
if (!is_sync)
bio_get(bio);
dio->multi_bio = true;
atomic_set(&dio->ref, 2);
} else {
atomic_inc(&dio->ref);
}
That actualy is before the submit_bio call, which is just below that
code.
See my previous message on the mailing list titled "io_uring: acquire
ctx->uring_lock before calling io_issue_sqe()" for the more details but
basically blkdev_bio_end_io() can be called before submit_bio() returns and
therefore free the initial bio. I think it is the unconditional bio_put()
at the end that does it.
But we never access the bio after submit_bio returns for the single
bio async case, so I still don't understand the problem.
Ok, I see your point.
My concern is with the code below for the single bio async case:
qc = submit_bio(bio);
if (polled)
WRITE_ONCE(iocb->ki_cookie, qc);
The bio/dio can be freed before the the cookie is written which is what
I'm seeing, and I thought this may lead to a scenario where that iocb
request could be completed, freed, reallocated, and resubmitted in
io_uring layer; i.e., I thought the cookie could be written into the
wrong iocb.
So I have two questions:
Could we ever update a wrong iocb?
Is the additional bio_get() the right way to mitigate that? I see that
this might not be true since end_io() calls ki_complete() regardless.
Thanks.
--bijan