> On 7 Mar 2019, at 08.23, Chansol Kim <chansol.kim@xxxxxxxxxxx> wrote: > > For large size io where blk_queue_split needs to be called inside > pblk_rw_io, results in bio leak as bio_endio is not called on the > newly allocated. One way to observe this is to mounting ext4 > filesystem on the target and issuing 1MB io with dd, e.g., dd bs=1MB > if=/dev/null of=/mount/myvolume. kmemleak reports: > > unreferenced object 0xffff88803d7d0100 (size 256): > comm "kworker/u16:1", pid 68, jiffies 4294899333 (age 284.120s) > hex dump (first 32 bytes): > 00 00 00 00 00 00 00 00 00 60 e8 31 81 88 ff ff .........`.1.... > 01 40 00 00 06 06 00 00 00 00 00 00 05 00 00 00 .@.............. > backtrace: > [<000000001f5aa04f>] kmem_cache_alloc+0x204/0x3c0 > [<0000000040945aab>] mempool_alloc_slab+0x1d/0x30 > [<00000000b4959ab4>] mempool_alloc+0x83/0x220 > [<00000000646bad9b>] bio_alloc_bioset+0x229/0x320 > [<000000009264b251>] bio_clone_fast+0x26/0xc0 > [<0000000008250252>] bio_split+0x41/0x110 > [<00000000e365cad0>] blk_queue_split+0x349/0x930 > [<00000000eb5426bc>] pblk_make_rq+0x1b5/0x1f0 > [<00000000eea09cec>] generic_make_request+0x2f9/0x690 > [<00000000ae6acede>] submit_bio+0x12e/0x1f0 > [<00000000f9b8b82a>] ext4_io_submit+0x64/0x80 > [<000000009e4f817d>] ext4_bio_write_page+0x32e/0x890 > [<00000000cbd0d106>] mpage_submit_page+0x65/0xc0 > [<000000000eec7359>] mpage_map_and_submit_buffers+0x171/0x330 > [<000000009a7afcb6>] ext4_writepages+0xd5e/0x1650 > [<000000004476b096>] do_writepages+0x39/0xc0 > > In case there is a need for a split, blk_queue_split returns the newly > allocated bio to the caller by changing the value of pointer passed as > a reference, while the original is passed to generic_make_requests. > > Although pblk_rw_io's local variable bio* has changed and passed to > pblk_submit_read and pblk_write_to_cache, work is done on this new > bio*, and pblk_rw_io returns NVM_IO_DONE, pblk_make_rq calls bio_endio > on the old bio* because it passed bio pointer by value to pblk_rw_io. > > pblk_rw_io is unfolded into pblk_make_rq so that there is no copying > of bio* and bio_endio is called on the correct bio*. > > Signed-off-by: Chansol Kim <chansol.kim@xxxxxxxxxxx> > --- > v2: > - Instead of changing pblk_rw_io's parameter, unfolded it into pblk_make_rq > > drivers/lightnvm/pblk-init.c | 47 ++++++++++++++++++-------------------------- > 1 file changed, 19 insertions(+), 28 deletions(-) > > diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c > index 8b643d0..0c98305 100644 > --- a/drivers/lightnvm/pblk-init.c > +++ b/drivers/lightnvm/pblk-init.c > @@ -47,36 +47,10 @@ static struct pblk_global_caches pblk_caches = { > > struct bio_set pblk_bio_set; > > -static int pblk_rw_io(struct request_queue *q, struct pblk *pblk, > - struct bio *bio) > -{ > - int ret; > - > - /* Read requests must be <= 256kb due to NVMe's 64 bit completion bitmap > - * constraint. Writes can be of arbitrary size. > - */ > - if (bio_data_dir(bio) == READ) { > - blk_queue_split(q, &bio); > - ret = pblk_submit_read(pblk, bio); > - if (ret == NVM_IO_DONE && bio_flagged(bio, BIO_CLONED)) > - bio_put(bio); > - > - return ret; > - } > - > - /* Prevent deadlock in the case of a modest LUN configuration and large > - * user I/Os. Unless stalled, the rate limiter leaves at least 256KB > - * available for user I/O. > - */ > - if (pblk_get_secs(bio) > pblk_rl_max_io(&pblk->rl)) > - blk_queue_split(q, &bio); > - > - return pblk_write_to_cache(pblk, bio, PBLK_IOTYPE_USER); > -} > - > static blk_qc_t pblk_make_rq(struct request_queue *q, struct bio *bio) > { > struct pblk *pblk = q->queuedata; > + int ret; > > if (bio_op(bio) == REQ_OP_DISCARD) { > pblk_discard(pblk, bio); > @@ -86,7 +60,24 @@ static blk_qc_t pblk_make_rq(struct request_queue *q, struct bio *bio) > } > } > > - switch (pblk_rw_io(q, pblk, bio)) { > + /* Read requests must be <= 256kb due to NVMe's 64 bit completion bitmap > + * constraint. Writes can be of arbitrary size. > + */ > + if (bio_data_dir(bio) == READ) { > + blk_queue_split(q, &bio); > + ret = pblk_submit_read(pblk, bio); > + } else { > + /* Prevent deadlock in the case of a modest LUN configuration > + * and large user I/Os. Unless stalled, the rate limiter > + * leaves at least 256KB available for user I/O. > + */ > + if (pblk_get_secs(bio) > pblk_rl_max_io(&pblk->rl)) > + blk_queue_split(q, &bio); > + > + ret = pblk_write_to_cache(pblk, bio, PBLK_IOTYPE_USER); > + } > + > + switch (ret) { > case NVM_IO_ERR: > bio_io_error(bio); > break; > -- > 2.7.4 Looks good to me. Reviewed-by: Javier González <javier@xxxxxxxxxxx>
Attachment:
signature.asc
Description: Message signed with OpenPGP