On 2/1/19 9:22 AM, Chansol Kim wrote:
On 01/31/19 22:14 PM, Matias Bjørling wrote:
On 1/30/19 2:53 AM, 김찬솔 wrote:
Changes:
1. Function pblk_rw_io to get bio* as a reference
2. In pblk_rw_io bio_put call on read case removed
A fix to address issue where
1. pblk_make_rq calls pblk_rw_io passes bio* pointer as a value (0xA)
2. pblk_rw_io calls blk_queue_split passing bio* pointer as reference
3. In blk_queue_split, when there is a split, the original bio* (0xA)
is passed to generic_make_requests, and the newly allocated bio is
returned
4. If NVM_IO_DONE returned, pblk_make_rq calls bio_endio on the bio*,
that is not the one returned by blk_queue_split
5. As a result bio_endio is not called on the newly allocated bio.
Signed-off-by: chansol.kim <chansol.kim@xxxxxxxxxxx>
---
drivers/lightnvm/pblk-init.c | 22 ++++++++--------------
1 file changed, 8 insertions(+), 14 deletions(-)
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index b57f764d..4efc929 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -31,30 +31,24 @@ static DECLARE_RWSEM(pblk_lock);
struct bio_set pblk_bio_set;
static int pblk_rw_io(struct request_queue *q, struct pblk *pblk,
- struct bio *bio)
+ struct bio **bio)
{
- int ret;
-
/* Read requests must be <= 256kb due to NVMe's 64 bit completion bitmap
* constraint. Writes can be of arbitrary size.
*/
- if (bio_data_dir(bio) == READ) {
- blk_queue_split(q, &bio);
- ret = pblk_submit_read(pblk, bio);
- if (ret == NVM_IO_DONE && bio_flagged(bio, BIO_CLONED))
- bio_put(bio);
Could we kill the NVM_DONE_IO check in the pblk_rw_io, that should
achieve the same?
I think it is possible to remove NVM_DONE_IO check here. And in that
case perhaps it is necessary to change bio_endio call to somewhere other
than pblk_make_rq, otherwise endio call would not be made to the new
bio*.
Assuming pblk_rw_io's second parameter is to be remained as bio*, There
are three cases I think needs consideration. NVM_IO_ERROR return case,
the read case and the write case.
In NVM_IO_ERROR return case, for both read and write. NVM_IO_ERROR
received by pblk_make_rq and bio_io_error called on bio, since this bio*
that pblk_submit_read and pblk_write_to_cache function tried and failed
might be a new one, so bio_io_error call needs to be made inside
pblk_rw_io.
In read case, there are three sub-cases. The first is All data is available
in ring buffer and NVM_IO_DONE is returned. The second is all to be read
from the device, which currently NVM_IO_OK is returned and endio is
called after read completion from the device. The third is partial read,
where the data that needs to be read from the device is read
synchronously and pblk_rw_io returns NVM_IO_DONE.
In write case, there are two sub-cases. Firstly, non REQ_PRE_FLUSH case,
pblk_write_cache wil return either NVM_IO_DONE or NVM_IO_ERROR. A endio
call is required in place somewhere NVM_IO_DONE is decided.
For REQ_PREFLUSH case bio (new bio* if split) is added to w_ctx.bios,
pblk_write_to_cache will return either NVM_IO_OK or NVM_IO_ERROR. bio*
added to w_ctx.bios will be called by bio_endio on write completion to
the disk. So it is already taken care of.
In summary my feeling is that having pblk_rw_io receive bio* as a
reference and removing bio_put in pblk_rw_io would be the minimum
change. Please share your insight, I will try experimenting alternatives.
What rubs me the wrong way is that that pattern isn't used in the rest
of kernel. I would rather move the calls to bio_io_error and bio_endio
into the pblk_rw_io() function. The implementation of pblk_rw_io() leaks
out to pblk_make_rq(). The code is a mismatch of some bio_endio calls
inside the pblk_rw_io, and others outside. It's not coherent.