On Mon, May 06, 2024 at 08:05:09AM +0200, Christoph Hellwig wrote: > Can we take a step back first? > > Current the blk-map user buffer handling decided to either pin > the memory and use that directly or use the normal user copy helpers > through copy_page_to_iter/copy_page_from_iter. > > Why do we even pin the memory here to then do an in-kernel copy instead > of doing the copy_from/to_user which is going to be a lot more efficient? Unlike blk-map, the integrity user buffer will fallback to a copy if the ubuf has too many segments, where blk_rq_map_user() fails with EINVAL. For user integrity, we have to pin the buffer anyway to get the true segment count and check against the queue limits, so the copy to/from takes advantage of that needed pin. That EINVAL has been the source of a lot of "bugs" where we have to explain why huge pages are necessary for largish (>512k) transfer nvme passthrough commands. It might be a nice feature if blk_rq_map_user() behaved like blk_integrity_map_user() for that condition. > Sort of related to that is that this does driver the copy to user and > unpin from bio_integrity_free, which is a low-level routine. It really > should be driven from the highlevel blk-map code that is the I/O > submitter, just like the data side. Shoe-horning uaccess into the > low-level block layer plumbing is just going to get us into trouble. Okay, I think I see what you're saying. We can make the existing use more like the blk-map code for callers using struct request. The proposed iouring generic read/write user metadata would need something different, but looks reasonable.