>On 2021/01/13 13:01, Changheun Lee wrote: >>> On 2021/01/12 21:14, Changheun Lee wrote: >>>>> On 2021/01/12 17:52, Changheun Lee wrote: >>>>>> From: "Changheun Lee" <nanich.lee@xxxxxxxxxxx> >>>>>> >>>>>> bio size can grow up to 4GB when muli-page bvec is enabled. >>>>>> but sometimes it would lead to inefficient behaviors. >>>>>> in case of large chunk direct I/O, - 64MB chunk read in user space - >>>>>> all pages for 64MB would be merged to a bio structure if memory address is >>>>>> continued phsycally. it makes some delay to submit until merge complete. >>>>>> bio max size should be limited as a proper size. >>>>> >>>>> But merging physically contiguous pages into the same bvec + later automatic bio >>>>> split on submit should give you better throughput for large IOs compared to >>>>> having to issue a bio chain of smaller BIOs that are arbitrarily sized and will >>>>> likely need splitting anyway (because of DMA boundaries etc). >>>>> >>>>> Do you have a specific case where you see higher performance with this patch >>>>> applied ? On Intel, BIO_MAX_SIZE would be 1MB... That is arbitrary and too small >>>>> considering that many hardware can execute larger IOs than that. >>>>> >>>> >>>> When I tested 32MB chunk read with O_DIRECT in android, all pages of 32MB >>>> is merged into a bio structure. >>>> And elapsed time to merge complete was about 2ms. >>>> It means first bio-submit is after 2ms. >>>> If bio size is limited with 1MB with this patch, first bio-submit is about >>>> 100us by bio_full operation. >>> >>> bio_submit() will split the large BIO case into multiple requests while the >>> small BIO case will likely result one or two requests only. That likely explain >>> the time difference here. However, for the large case, the 2ms will issue ALL >>> requests needed for processing the entire 32MB user IO while the 1MB bio case >>> will need 32 different bio_submit() calls. So what is the actual total latency >>> difference for the entire 32MB user IO ? That is I think what needs to be >>> compared here. >>> >>> Also, what is your device max_sectors_kb and max queue depth ? >>> >> >> 32MB total latency is about 19ms including merge time without this patch. >> But with this patch, total latency is about 17ms including merge time too. >> Actually 32MB read time from device is same - about 16.7ms - in driver layer. >> No need to hold more I/O than max_sectors_kb during bio merge. >> My device is UFS. and max_sectors_kb is 1MB, queue depth is 32. > >This may be due to the CPU being slow: it takes time to build the 32MB BIO, >during which the device is idling. With the 1MB BIO limit, the first BIO hits >the drive much more quickly, reducing idle time of the device and leading to >higher throughput. But there are 31 more BIOs to build and issue after the first >one... That does create a pipeline of requests keeping the device busy, but that >also likely keeps your CPU a lot more busy building these additional BIOs. >Overall, do you see a difference in CPU load for the 32MB BIO case vs the 1MB >max BIO case ? Any increase in CPU load with the lower BIO size limit ? > CPU load is more than 32MB bio size. Because bio_sumbit progress is doing in parallel. But I tested it same CPU operation frequency, So there are no difference of CPU speed. Logically, bio max size is 4GB now. it's not realistic I know, but 4GB merge to a bio will takes much time even if CPU works fast. This is why I think bio max size should be limited. >> >>>> It's not large delay and can't be observed with low speed device. >>>> But it's needed to reduce merge delay for high speed device. >>>> I improved 512MB sequential read performance from 1900MB/s to 2000MB/s >>>> with this patch on android platform. >>>> As you said, 1MB might be small for some device. >>>> But method is needed to re-size, or select the bio max size. >>> >>> At the very least, I think that such limit should not be arbitrary as your patch >>> proposes but rely on the device characteristics (e.g. >>> max_hw_sectors_kb/max_sectors_kb and queue depth). >>> >> >> I agree with your opinion, I thought same as your idea. For that, deep research >> is needed, proper timing to set and bio structure modification, etc ... > >Why would you need any BIO structure modifications ? Your patch is on the right >track if limiting the BIO size is the right solution (I am not still completely >convinced). E.g., the code: > >if (page_is_mergeable(bv, page, len, off, same_page)) { >if (bio->bi_iter.bi_size > BIO_MAX_SIZE - len) { >*same_page = false; >return false; >} > >could just become: > >if (page_is_mergeable(bv, page, len, off, same_page)) { >if (bio->bi_iter.bi_size > bio_max_size(bio) - len) { >*same_page = false; >return false; >} > >With bio_max_size() being something like: > >static inline size_t bio_max_size(struct bio *bio) >{ >sector_t max_sectors = blk_queue_get_max_sectors(bio->bi_disk->queue, >bio_op(bio)); > >return max_sectors << SECTOR_SHIFT; >} > >Note that this is not super efficient as a BIO maximum size depends on the BIO >offset too (its start sector). So writing something similar to >blk_rq_get_max_sectors() would probably be better. Good suggestion. :) > >> Current is simple patch for default bio max size. >> Before applying of multipage bvec, bio max size was 1MB in kernel 4.x by BIO_MAX_PAGES. >> So I think 1MB bio max size is reasonable as a default. > >max_sectors_kb is always defined for any block device so I do not think there is >a need for any arbitrary default value. > >Since such optimization likely very much depend on the speed of the system CPU >and of the storage device used, it may be a good idea to have this configurable >through sysfs. That is, bio_max_size() simply returns UINT_MAX leading to no >change from the current behavior if the optimization is disabled (default) and >max_sectors_kb if it is enabled. > OK, I agree with you. It will be best for all now. I'll try to make this. >> >>>> >>>>> >>>>>> >>>>>> Signed-off-by: Changheun Lee <nanich.lee@xxxxxxxxxxx> >>>>>> --- >>>>>> block/bio.c | 2 +- >>>>>> include/linux/bio.h | 3 ++- >>>>>> 2 files changed, 3 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/block/bio.c b/block/bio.c >>>>>> index 1f2cc1fbe283..dbe14d675f28 100644 >>>>>> --- a/block/bio.c >>>>>> +++ b/block/bio.c >>>>>> @@ -877,7 +877,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page, >>>>>> struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1]; >>>>>> >>>>>> if (page_is_mergeable(bv, page, len, off, same_page)) { >>>>>> - if (bio->bi_iter.bi_size > UINT_MAX - len) { >>>>>> + if (bio->bi_iter.bi_size > BIO_MAX_SIZE - len) { >>>>>> *same_page = false; >>>>>> return false; >>>>>> } >>>>>> diff --git a/include/linux/bio.h b/include/linux/bio.h >>>>>> index 1edda614f7ce..0f49b354b1f6 100644 >>>>>> --- a/include/linux/bio.h >>>>>> +++ b/include/linux/bio.h >>>>>> @@ -20,6 +20,7 @@ >>>>>> #endif >>>>>> >>>>>> #define BIO_MAX_PAGES 256 >>>>>> +#define BIO_MAX_SIZE (BIO_MAX_PAGES * PAGE_SIZE) >>>>>> >>>>>> #define bio_prio(bio) (bio)->bi_ioprio >>>>>> #define bio_set_prio(bio, prio) ((bio)->bi_ioprio = prio) >>>>>> @@ -113,7 +114,7 @@ static inline bool bio_full(struct bio *bio, unsigned len) >>>>>> if (bio->bi_vcnt >= bio->bi_max_vecs) >>>>>> return true; >>>>>> >>>>>> - if (bio->bi_iter.bi_size > UINT_MAX - len) >>>>>> + if (bio->bi_iter.bi_size > BIO_MAX_SIZE - len) >>>>>> return true; >>>>>> >>>>>> return false; >>>>>> >>>>> >>>>> >>>>> -- >>>>> Damien Le Moal >>>>> Western Digital Research >>>> >>> >>> >>> -- >>> Damien Le Moal >>> Western Digital Research >>> >> >> --- >> Changheun Lee >> Samsung Electronics >> >> > > >-- >Damien Le Moal >Western Digital Research > --- Changheun Lee Samsung Electronics