On Wed, Aug 30, 2017 at 03:06:16PM -0700, Shaohua Li wrote: > On Wed, Aug 30, 2017 at 02:43:40PM +0800, Ming Lei wrote: > > On Tue, Aug 29, 2017 at 09:43:20PM -0700, Shaohua Li wrote: > > > On Wed, Aug 30, 2017 at 10:51:21AM +0800, Ming Lei wrote: > > > > On Tue, Aug 29, 2017 at 08:13:39AM -0700, Shaohua Li wrote: > > > > > On Tue, Aug 29, 2017 at 05:56:05PM +0800, Ming Lei wrote: > > > > > > On Thu, Aug 24, 2017 at 12:24:53PM -0700, Shaohua Li wrote: > > > > > > > From: Shaohua Li <shli@xxxxxx> > > > > > > > > > > > > > > Currently loop disables merge. While it makes sense for buffer IO mode, > > > > > > > directio mode can benefit from request merge. Without merge, loop could > > > > > > > send small size IO to underlayer disk and harm performance. > > > > > > > > > > > > Hi Shaohua, > > > > > > > > > > > > IMO no matter if merge is used, loop always sends page by page > > > > > > to VFS in both dio or buffer I/O. > > > > > > > > > > Why do you think so? > > > > > > > > do_blockdev_direct_IO() still handles page by page from iov_iter, and > > > > with bigger request, I guess it might be the plug merge working. > > > > > > This is not true. directio sends big size bio directly, not because of plug > > > merge. Please at least check the code before you complain. > > > > I complain nothing, just try to understand the idea behind, > > never mind, :-) > > > > > > > > > > > > > > > > Also if merge is enabled on loop, that means merge is run > > > > > > on both loop and low level block driver, and not sure if we > > > > > > can benefit from that. > > > > > > > > > > why does merge still happen in low level block driver? > > > > > > > > Because scheduler is still working on low level disk. My question > > > > is that why the scheduler in low level disk doesn't work now > > > > if scheduler on loop can merge? > > > > > > The low level disk can still do merge, but since this is directio, the upper > > > layer already dispatches request as big as possible. There is very little > > > chance the requests can be merged again. > > > > That is true, but these requests need to enter scheduler queue and > > be tried to merge again, even though it is less possible to succeed. > > Double merge may take extra CPU utilization. > > > > Looks it doesn't answer my question. > > > > Without this patch, the requests dispatched to loop won't be merged, > > so they may be small and their sectors may be continuous, my question > > is why dio bios converted from these small loop requests can't be > > merged in block layer when queuing these dio bios to low level device? > > loop thread doesn't have plug there. Even we have plug there, it's still a bad > idea to do the merge in low level layer. If we run direct_IO for every 4k, the > overhead is much much higher than bio merge. The direct_IO will call into fs > code, take different mutexes, metadata update for write and so on. OK, that looks making sense now. -- Ming