On Sun, Apr 11, 2021 at 10:13:01PM +0000, Damien Le Moal wrote: > On 2021/04/09 23:47, Bart Van Assche wrote: > > On 4/7/21 3:27 AM, Damien Le Moal wrote: > >> On 2021/04/07 18:46, Changheun Lee wrote: > >>> I'll prepare new patch as you recommand. It will be added setting of > >>> limit_bio_size automatically when queue max sectors is determined. > >> > >> Please do that in the driver for the HW that benefits from it. Do not do this > >> for all block devices. > > > > Hmm ... is it ever useful to build a bio with a size that exceeds > > max_hw_sectors when submitting a bio directly to a block device, or in > > other words, if no stacked block driver sits between the submitter and > > the block device? Am I perhaps missing something? > > Device performance wise, the benefits are certainly not obvious to me either. > But for very fast block devices, I think the CPU overhead of building more > smaller BIOs may be significant compared to splitting a large BIO into multiple > requests. Though it may be good to revisit this with some benchmark numbers. This patch tries to address issue[1] in do_direct_IO() in which Changheun observed that other operations takes time between adding page to bio. However, do_direct_IO() just does following except for adding bio and submitting bio: - retrieves pages at batch(pin 64 pages each time from VM) and - retrieve block mapping(get_more_blocks), which is still done usually very less times for 32MB; for new mapping, clean_bdev_aliases() may take a bit time. If there isn't system memory pressure, pin 64 pages won't be slow, but get_more_blocks() may take a bit time. Changheun, can you check if multiple get_more_blocks() is called for submitting 32MB in your test? In my 32MB sync dio f2fs test on x86_64 VM, one buffer_head mapping can hold 32MB, but it is one freshly new f2fs. I'd suggest to understand the issue completely before figuring out one solution. [1] https://lore.kernel.org/linux-block/20210202041204.28995-1-nanich.lee@xxxxxxxxxxx/ Thanks, Ming