On Tue, Aug 29, 2017 at 05:56:05PM +0800, Ming Lei wrote: > On Thu, Aug 24, 2017 at 12:24:53PM -0700, Shaohua Li wrote: > > From: Shaohua Li <shli@xxxxxx> > > > > Currently loop disables merge. While it makes sense for buffer IO mode, > > directio mode can benefit from request merge. Without merge, loop could > > send small size IO to underlayer disk and harm performance. > > Hi Shaohua, > > IMO no matter if merge is used, loop always sends page by page > to VFS in both dio or buffer I/O. Why do you think so? > Also if merge is enabled on loop, that means merge is run > on both loop and low level block driver, and not sure if we > can benefit from that. why does merge still happen in low level block driver? > > So Could you provide some performance data about this patch? In my virtual machine, a workload improves from ~20M/s to ~50M/s. And I clearly see the request size becomes bigger. > > > > Reviewed-by: Omar Sandoval <osandov@xxxxxx> > > Signed-off-by: Shaohua Li <shli@xxxxxx> > > --- > > drivers/block/loop.c | 66 ++++++++++++++++++++++++++++++++++++++++------------ > > drivers/block/loop.h | 1 + > > 2 files changed, 52 insertions(+), 15 deletions(-) > > > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > > index 428da07..75a8f6e 100644 > > --- a/drivers/block/loop.c > > +++ b/drivers/block/loop.c > > @@ -213,10 +213,13 @@ static void __loop_update_dio(struct loop_device *lo, bool dio) > > */ > > blk_mq_freeze_queue(lo->lo_queue); > > lo->use_dio = use_dio; > > - if (use_dio) > > + if (use_dio) { > > + queue_flag_clear_unlocked(QUEUE_FLAG_NOMERGES, lo->lo_queue); > > lo->lo_flags |= LO_FLAGS_DIRECT_IO; > > - else > > + } else { > > + queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, lo->lo_queue); > > lo->lo_flags &= ~LO_FLAGS_DIRECT_IO; > > + } > > blk_mq_unfreeze_queue(lo->lo_queue); > > } > > > > @@ -464,6 +467,8 @@ static void lo_rw_aio_complete(struct kiocb *iocb, long ret, long ret2) > > { > > struct loop_cmd *cmd = container_of(iocb, struct loop_cmd, iocb); > > > > + kfree(cmd->bvec); > > + cmd->bvec = NULL; > > cmd->ret = ret; > > blk_mq_complete_request(cmd->rq); > > } > > @@ -473,22 +478,50 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, > > { > > struct iov_iter iter; > > struct bio_vec *bvec; > > - struct bio *bio = cmd->rq->bio; > > + struct request *rq = cmd->rq; > > + struct bio *bio = rq->bio; > > struct file *file = lo->lo_backing_file; > > + unsigned int offset; > > + int segments = 0; > > int ret; > > > > - /* nomerge for loop request queue */ > > - WARN_ON(cmd->rq->bio != cmd->rq->biotail); > > + if (rq->bio != rq->biotail) { > > + struct req_iterator iter; > > + struct bio_vec tmp; > > + > > + __rq_for_each_bio(bio, rq) > > + segments += bio_segments(bio); > > + bvec = kmalloc(sizeof(struct bio_vec) * segments, GFP_KERNEL); > > The allocation should have been GFP_NOIO. Sounds good. To make this completely correct isn't easy though, we are calling into the underlayer fs operations which can do allocation. Thanks, Shaohua