> On Fri, May 14, 2021 at 03:32:41PM +0900, Changheun Lee wrote: > > I tested 512MB file read with direct I/O. and chunk size is 64MB. > > - on SCSI disk, with no limit of bio max size(4GB) : avg. 630 MB/s > > - on SCSI disk, with limit bio max size to 1MB : avg. 645 MB/s > > - on ramdisk, with no limit of bio max size(4GB) : avg. 2749 MB/s > > - on ramdisk, with limit bio max size to 1MB : avg. 3068 MB/s > > > > I set ramdisk environment as below. > > - dd if=/dev/zero of=/mnt/ramdisk.img bs=$((1024*1024)) count=1024 > > - mkfs.ext4 /mnt/ramdisk.img > > - mkdir /mnt/ext4ramdisk > > - mount -o loop /mnt/ramdisk.img /mnt/ext4ramdisk > > > > With low performance disk, bio submit delay caused by large bio size is > > not big protion. So it can't be feel easily. But it will be shown in high > > performance disk. > > So let's attack the problem properly: > > 1) switch f2fs to a direct I/O implementation that does not suck > 2) look into optimizing the iomap code to e.g. submit the bio once > it is larger than queue_io_opt() without failing to add to a bio > which would be annoying for things like huge pages. There is bio submit delay in iomap_dio_bio_actor() too. As bio size increases, bio_iov_iter_get_pages() in iomap_dio_bio_actor() takes time more. I measured how much time is spent of bio_iov_iter_get_pages() for each bio size with ftrace. I added ftrace at below position. -------------- static loff_t iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, struct iomap_dio *dio, struct iomap *iomap) { ... snip ... nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS); do { ... snip ... trace_mark_begin_end('B', "iomap_dio_bio_actor", "bio_iov_iter_get_pages", "bi_size", bio->bi_iter.bi_size, 0); ret = bio_iov_iter_get_pages(bio, dio->submit.iter); trace_mark_begin_end('E', "iomap_dio_bio_actor", "bio_iov_iter_get_pages", "bi_size", bio->bi_iter.bi_size, 0); ... snip ... } while (nr_pages); ... snip ... } bio submit delay was 0.834ms for 32MB bio. ---------- 4154.574861: mark_begin_end: B|11511|iomap_dio_bio_actor:bio_iov_iter_get_pages|bi_size=0; 4154.575317: mark_begin_end: E|11511|iomap_dio_bio_actor:bio_iov_iter_get_pages|bi_size=34181120; 4154.575695: block_bio_queue: 7,5 R 719672 + 66760 [tiotest] bio submit delay was 0.027ms for 1MB bio. ---------- 4868.617791: mark_begin_end: B|19510|iomap_dio_bio_actor:bio_iov_iter_get_pages|bi_size=0; 4868.617807: mark_begin_end: E|19510|iomap_dio_bio_actor:bio_iov_iter_get_pages|bi_size=1048576; 4868.617818: block_bio_queue: 7,5 R 1118208 + 2048 [tiotest] To optimize this, current patch, or similar approch is needed in bio_iov_iter_get_pages(). Is it OK making a new function to set bio max size like as below? And call it where bio max size limitation is needed. void blk_queue_bio_max_size(struct request_queue *q, unsigned int bytes) { struct queue_limits *limits = &q->limits; unsigned int bio_max_size = round_up(bytes, PAGE_SIZE); limits->bio_max_bytes = max_t(unsigned int, bio_max_size, BIO_MAX_VECS * PAGE_SIZE); } EXPORT_SYMBOL(blk_queue_bio_max_size);