On Tue, Feb 02, 2021 at 01:12:04PM +0900, Changheun Lee wrote: > > On Mon, Feb 01, 2021 at 11:52:48AM +0900, Changheun Lee wrote: > > > > On Fri, Jan 29, 2021 at 12:49:08PM +0900, Changheun Lee wrote: > > > > > bio size can grow up to 4GB when muli-page bvec is enabled. > > > > > but sometimes it would lead to inefficient behaviors. > > > > > in case of large chunk direct I/O, - 32MB chunk read in user space - > > > > > all pages for 32MB would be merged to a bio structure if the pages > > > > > physical addresses are contiguous. it makes some delay to submit > > > > > until merge complete. bio max size should be limited to a proper size. > > > > > > > > > > When 32MB chunk read with direct I/O option is coming from userspace, > > > > > kernel behavior is below now. it's timeline. > > > > > > > > > > | bio merge for 32MB. total 8,192 pages are merged. > > > > > | total elapsed time is over 2ms. > > > > > |------------------ ... ----------------------->| > > > > > | 8,192 pages merged a bio. > > > > > | at this time, first bio submit is done. > > > > > | 1 bio is split to 32 read request and issue. > > > > > |---------------> > > > > > |---------------> > > > > > |---------------> > > > > > ...... > > > > > |---------------> > > > > > |--------------->| > > > > > total 19ms elapsed to complete 32MB read done from device. | > > > > > > > > > > If bio max size is limited with 1MB, behavior is changed below. > > > > > > > bio_iov_iter_get_pages> > > | bio merge for 1MB. 256 pages are merged for each bio. > > > > > | total 32 bio will be made. > > > > > | total elapsed time is over 2ms. it's same. > > > > > | but, first bio submit timing is fast. about 100us. > > > > > |--->|--->|--->|---> ... -->|--->|--->|--->|--->| > > > > > | 256 pages merged a bio. > > > > > | at this time, first bio submit is done. > > > > > | and 1 read request is issued for 1 bio. > > > > > |---------------> > > > > > |---------------> > > > > > |---------------> > > > > > ...... > > > > > |---------------> > > > > > |--------------->| > > > > > total 17ms elapsed to complete 32MB read done from device. | > > > > > > > > Can you share us if enabling THP in your application can avoid this issue? BTW, you > > > > need to make the 32MB buffer aligned with huge page size. IMO, THP perfectly fits > > > > your case. > > > > > > > > > > THP is enabled already like as below in my environment. It has no effect. > > > > > > cat /sys/kernel/mm/transparent_hugepage/enabled > > > [always] madvise never > > > > The 32MB user buffer needs to be huge page size aligned. If your system > > supports bcc/bpftrace, it is quite easy to check if the buffer is > > aligned. > > > > > > > > This issue was reported from performance benchmark application in open market. > > > I can't control application's working in open market. > > > It's not only my own case. This issue might be occured in many mobile environment. > > > At least, I checked this problem in exynos, and qualcomm chipset. > > > > You just said it takes 2ms for building 32MB bio, but you never investigate the > > reason. I guess it is from get_user_pages_fast(), but maybe others. Can you dig > > further for the reason? Maybe it is one arm64 specific issue. > > > > BTW, bio_iov_iter_get_pages() just takes ~200us on one x86_64 VM with THP, which is > > observed via bcc/funclatency when running the following workload: > > > > I think you focused on bio_iov_iter_get_pages() because I just commented page > merge delay only. Sorry about that. I missed details of this issue. > Actually there are many operations during while-loop in do_direct_IO(). > Page merge operation is just one among them. Page merge operation is called > by dio_send_cur_page() in while-loop. Below is call stack. > > __bio_try_merge_page+0x4c/0x614 > bio_add_page+0x40/0x12c > dio_send_cur_page+0x13c/0x374 > submit_page_section+0xb4/0x304 > do_direct_IO+0x3d4/0x854 > do_blockdev_direct_IO+0x488/0xa18 > __blockdev_direct_IO+0x30/0x3c > f2fs_direct_IO+0x6d0/0xb80 > generic_file_read_iter+0x284/0x45c > f2fs_file_read_iter+0x3c/0xac > __vfs_read+0x19c/0x204 > vfs_read+0xa4/0x144 > > 2ms delay is not only caused by page merge operation. it inculdes many the > other operations too. But those many operations included page merge should > be executed more if bio size is grow up. OK, got it. Then I think you can just limit bio size in dio_bio_add_page() instead of doing it for all. -- Ming