On Tue, 2023-11-07 at 11:48 +0800, Ming Lei wrote: > On Tue, Nov 07, 2023 at 02:53:20AM +0000, Ed Tsai (蔡宗軒) wrote: > > On Mon, 2023-11-06 at 19:54 +0800, Ming Lei wrote: > > > On Mon, Nov 06, 2023 at 12:53:31PM +0800, Ming Lei wrote: > > > > On Mon, Nov 06, 2023 at 01:40:12AM +0000, Ed Tsai (蔡宗軒) wrote: > > > > > On Mon, 2023-11-06 at 09:33 +0800, Ed Tsai wrote: > > > > > > On Sat, 2023-11-04 at 11:43 +0800, Ming Lei wrote: > > > > > > > > ... > > > > > > > > > Sorry for missing out on my dd command. Here it is: > > > > > dd if=/data/test_file of=/dev/null bs=64m count=1 > iflag=direct > > > > > > > > OK, thanks for the sharing. > > > > > > > > I understand the issue now, but not sure if it is one good idea > to > > > check > > > > queue limit in __bio_iov_iter_get_pages(): > > > > > > > > 1) bio->bi_bdev may not be set > > > > > > > > 2) what matters is actually bio's alignment, and bio size still > can > > > > be big enough > > > > > > > > So I cooked one patch, and it should address your issue: > > > > > > The following one fixes several bugs, and is verified to be > capable > > > of > > > making big & aligned bios, feel free to run your test against > this > > > one: > > > > > > block/bio.c | 28 +++++++++++++++++++++++++++- > > > 1 file changed, 27 insertions(+), 1 deletion(-) > > > > > > diff --git a/block/bio.c b/block/bio.c > > > index 816d412c06e9..80b36ce57510 100644 > > > --- a/block/bio.c > > > +++ b/block/bio.c > > > @@ -1211,6 +1211,7 @@ static int > bio_iov_add_zone_append_page(struct > > > bio *bio, struct page *page, > > > } > > > > > > #define PAGE_PTRS_PER_BVEC (sizeof(struct bio_vec) / > > > sizeof(struct page *)) > > > +#define BIO_CHUNK_SIZE(256U << 10) > > > > > > /** > > > * __bio_iov_iter_get_pages - pin user or kernel pages and add > them > > > to a bio > > > @@ -1266,6 +1267,31 @@ static int __bio_iov_iter_get_pages(struct > bio > > > *bio, struct iov_iter *iter) > > > size -= trim; > > > } > > > > > > +/* > > > + * Try to make bio aligned with 128KB if it isn't the last one, > so > > > + * we can avoid small bio in case of big chunk sequential IO > because > > > + * of bio split and multipage bvec. > > > + * > > > + * If nothing is added to this bio, simply allow unaligned since > we > > > + * have chance to add more bytes > > > + */ > > > +if (iov_iter_count(iter) && bio->bi_iter.bi_size) { > > > +unsigned int aligned_size = (bio->bi_iter.bi_size + size) & > > > +~(BIO_CHUNK_SIZE - 1); > > > + > > > +if (aligned_size <= bio->bi_iter.bi_size) { > > > +/* stop to add page if this bio can't keep aligned */ > > > +if (!(bio->bi_iter.bi_size & (BIO_CHUNK_SIZE - 1))) { > > > +ret = left = size; > > > +goto revert; > > > +} > > > +} else { > > > +aligned_size -= bio->bi_iter.bi_size; > > > +iov_iter_revert(iter, size - aligned_size); > > > +size = aligned_size; > > > +} > > > +} > > > + > > > if (unlikely(!size)) { > > > ret = -EFAULT; > > > goto out; > > > @@ -1285,7 +1311,7 @@ static int __bio_iov_iter_get_pages(struct > bio > > > *bio, struct iov_iter *iter) > > > > > > offset = 0; > > > } > > > - > > > +revert: > > > iov_iter_revert(iter, left); > > > out: > > > while (i < nr_pages) > > > -- > > > 2.41.0 > > > > > > > > > > > > Thanks, > > > Ming > > > > > > > The latest patch you provided with 256 alignment does help > alleviate > > the severity of fragmentation. However, the actual aligned size may > > vary depending on the device. Using a fixed and universal size of > 128 > > or 256KB only provides partial relief from fragmentation. > > > > I performed a dd direct I/O read of 64MB with your patch, and > although > > most of the bios were aligned, there were still cases of > missalignment > > to the device limit (e.g., 512MB for my device), as shown below: > > 512MB is really big, and actually you have reached 3520MB in READ by > limiting max bio size to 1MB in your original patch. > > Just be curious what is the data if you change to align with max > sectors > against my last patch? which can try to maximize & align bio. Sorry, it is a typo. Please disregard it. It should be 512KB instead. > > > > > dd [000] ..... 392.976830: block_bio_queue: 254,52 R 2997760 + 3584 > > dd [000] ..... 392.979940: block_bio_queue: 254,52 R 3001344 + 3584 > > dd [000] ..... 392.983235: block_bio_queue: 254,52 R 3004928 + 3584 > > dd [000] ..... 392.986468: block_bio_queue: 254,52 R 3008512 + 3584 > > Yeah, I thought that 128KB should be fine for usual hardware, but > looks not good enough. > > > > > Comparing the results of the Antutu Sequential test to the previous > > data, it is indeed an improvement, but still a slight difference > with > > limiting the bio size to max_sectors: > > > > Sequential Read (average of 5 rounds): > > Original: 3033.7 MB/sec > > Limited to max_sectors: 3520.9 MB/sec > > Aligned 256KB: 3471.5 MB/sec > > > > Sequential Write (average of 5 rounds): > > Original: 2225.4 MB/sec > > Limited to max_sectors: 2800.3 MB/sec > > Aligned 256KB: 2618.1 MB/sec > > Thanks for sharing the data. > > > > > What if we limit the bio size only for those who have set the > > max_sectors? > > I think it may be doable, but need more smart approach for avoiding > extra cost of iov_iter_revert(), and one way is to add bio_shrink() > (or bio_revert()) to run the alignment just once. > > I will think further and write a new patch if it is doable. > > > > Thanks, > Ming > Thank you very much. I will continue to stay updated on this issue to see if there are any difficulties or alternative directions that may arise. Best, Ed