On Wed, Oct 05, 2022 at 11:23:32AM -0700, Bart Van Assche wrote: > On 10/5/22 10:00, Keith Busch wrote: > > If the hardware's DMA segment is smaller than a page, why doesn't the driver > > just split a kernel's larger segment into whatever representation the hardware > > wants? We do that in nvme, at least. > > Hi Keith, > > That's an interesting question. Your question made me realize that the > bio_map_kern() changes I proposed can be dropped if the code for counting > the number of segments is modified to support small segments. > > My answer to your question is twofold: > * Splitting segments in a driver is easy to do if that doesn't cause the > number of segments limit to be exceeded (queue_limits.max_segments). It is > the responsibility of the block layer to split bios that exceed the maximum > number of segments into multiple bios - this is something that cannot be > done in a block driver. This is why I think that a (small number of) block > layer changes are needed. I believe all bio's that bio_split() yields are supposed to be usable as-is with the driver that created the queue limits. If the driver needs to split further from there, I feel like that means the limits may need adjusting. It sounds like max_hw_sectors is inconsistent with max_segments. Shouldn't this work if max_hw_sectors was set to 'max_segments * logical_block_size'? > * The blk_rq_map_sg() function really needs to be modified to support > segments smaller than the page size. That's surprising. We use that in nvme where merges and splits to 4k segments are required, but it works with larger page sizes.