> On Sun, Apr 11, 2021 at 10:13:01PM +0000, Damien Le Moal wrote: > > On 2021/04/09 23:47, Bart Van Assche wrote: > > > On 4/7/21 3:27 AM, Damien Le Moal wrote: > > >> On 2021/04/07 18:46, Changheun Lee wrote: > > >>> I'll prepare new patch as you recommand. It will be added setting of > > >>> limit_bio_size automatically when queue max sectors is determined. > > >> > > >> Please do that in the driver for the HW that benefits from it. Do not do this > > >> for all block devices. > > > > > > Hmm ... is it ever useful to build a bio with a size that exceeds > > > max_hw_sectors when submitting a bio directly to a block device, or in > > > other words, if no stacked block driver sits between the submitter and > > > the block device? Am I perhaps missing something? > > > > Device performance wise, the benefits are certainly not obvious to me either. > > But for very fast block devices, I think the CPU overhead of building more > > smaller BIOs may be significant compared to splitting a large BIO into multiple > > requests. Though it may be good to revisit this with some benchmark numbers. > > This patch tries to address issue[1] in do_direct_IO() in which > Changheun observed that other operations takes time between adding page > to bio. > > However, do_direct_IO() just does following except for adding bio and > submitting bio: > > - retrieves pages at batch(pin 64 pages each time from VM) and > > - retrieve block mapping(get_more_blocks), which is still done usually > very less times for 32MB; for new mapping, clean_bdev_aliases() may > take a bit time. > > If there isn't system memory pressure, pin 64 pages won't be slow, but > get_more_blocks() may take a bit time. > > Changheun, can you check if multiple get_more_blocks() is called for submitting > 32MB in your test? almost one time called. > > In my 32MB sync dio f2fs test on x86_64 VM, one buffer_head mapping can > hold 32MB, but it is one freshly new f2fs. > > I'd suggest to understand the issue completely before figuring out one > solution. Thank you for your advice. I'll analyze more about your point later. :) But I think it's different from finding main time spend point in do_direct_IO(). I think excessive loop should be controlled. 8,192 loops in do_direct_IO() - for 32MB - to submit one bio is too much on 4KB page system. I want to apply a optional solution to avoid excessive loop casued by multipage bvec. Thanks, Changheun Lee