On 10/5/22 12:10, Keith Busch wrote:
On Wed, Oct 05, 2022 at 11:23:32AM -0700, Bart Van Assche wrote:
That's an interesting question. Your question made me realize that the
bio_map_kern() changes I proposed can be dropped if the code for counting
the number of segments is modified to support small segments.
My answer to your question is twofold:
* Splitting segments in a driver is easy to do if that doesn't cause the
number of segments limit to be exceeded (queue_limits.max_segments). It is
the responsibility of the block layer to split bios that exceed the maximum
number of segments into multiple bios - this is something that cannot be
done in a block driver. This is why I think that a (small number of) block
layer changes are needed.
I believe all bio's that bio_split() yields are supposed to be usable as-is
with the driver that created the queue limits. If the driver needs to split
further from there, I feel like that means the limits may need adjusting.
It sounds like max_hw_sectors is inconsistent with max_segments. Shouldn't this
work if max_hw_sectors was set to 'max_segments * logical_block_size'?
* The blk_rq_map_sg() function really needs to be modified to support
segments smaller than the page size.
That's surprising. We use that in nvme where merges and splits to 4k segments
are required, but it works with larger page sizes.
Hi Keith,
bio_add_page() can keep adding pages to a bio until either the
bi_max_vecs limit is reached or the bi_iter.bi_size limit is reached.
Splitting a bio involves calling bvec_split_segs(). The latter function
supports multiple segments per bvec. Hence, blk_rq_map_sg() may receive
a bio with multiple segments per bvec. This is why I think that
blk_rq_map_sg() has to be modified to support multiple segments per page.
Thanks,
Bart.