Re: [PATCH V2 2/2] block/loop: allow request merge for directio mode

Shaohua Li <shli@xxxxxxxxxx> · Wed, 30 Aug 2017 15:06:16 -0700

On Wed, Aug 30, 2017 at 02:43:40PM +0800, Ming Lei wrote:
> On Tue, Aug 29, 2017 at 09:43:20PM -0700, Shaohua Li wrote:
> > On Wed, Aug 30, 2017 at 10:51:21AM +0800, Ming Lei wrote:
> > > On Tue, Aug 29, 2017 at 08:13:39AM -0700, Shaohua Li wrote:
> > > > On Tue, Aug 29, 2017 at 05:56:05PM +0800, Ming Lei wrote:
> > > > > On Thu, Aug 24, 2017 at 12:24:53PM -0700, Shaohua Li wrote:
> > > > > > From: Shaohua Li <shli@xxxxxx>
> > > > > > 
> > > > > > Currently loop disables merge. While it makes sense for buffer IO mode,
> > > > > > directio mode can benefit from request merge. Without merge, loop could
> > > > > > send small size IO to underlayer disk and harm performance.
> > > > > 
> > > > > Hi Shaohua,
> > > > > 
> > > > > IMO no matter if merge is used, loop always sends page by page
> > > > > to VFS in both dio or buffer I/O.
> > > > 
> > > > Why do you think so?
> > > 
> > > do_blockdev_direct_IO() still handles page by page from iov_iter, and
> > > with bigger request, I guess it might be the plug merge working.
> > 
> > This is not true. directio sends big size bio directly, not because of plug
> > merge. Please at least check the code before you complain.
> 
> I complain nothing, just try to understand the idea behind,
> never mind, :-)
> 
> > 
> > > >  
> > > > > Also if merge is enabled on loop, that means merge is run
> > > > > on both loop and low level block driver, and not sure if we
> > > > > can benefit from that.
> > > > 
> > > > why does merge still happen in low level block driver?
> > > 
> > > Because scheduler is still working on low level disk. My question
> > > is that why the scheduler in low level disk doesn't work now
> > > if scheduler on loop can merge?
> > 
> > The low level disk can still do merge, but since this is directio, the upper
> > layer already dispatches request as big as possible. There is very little
> > chance the requests can be merged again.
> 
> That is true, but these requests need to enter scheduler queue and
> be tried to merge again, even though it is less possible to succeed.
> Double merge may take extra CPU utilization.
> 
> Looks it doesn't answer my question.
> 
> Without this patch, the requests dispatched to loop won't be merged,
> so they may be small and their sectors may be continuous, my question
> is why dio bios converted from these small loop requests can't be
> merged in block layer when queuing these dio bios to low level device?

loop thread doesn't have plug there. Even we have plug there, it's still a bad
idea to do the merge in low level layer. If we run direct_IO for every 4k, the
overhead is much much higher than bio merge. The direct_IO will call into fs
code, take different mutexes, metadata update for write and so on.