On Sat, Jan 27, 2024 at 12:57:45PM -0500, Kent Overstreet wrote: > On Fri, Jan 19, 2024 at 04:24:29PM +0000, Matthew Wilcox wrote: > > - What are we going to do about bio_vecs? > > For bios and biovecs, I think it's important to keep in mind the > distinction between the code that owns and submits the bio, and the > consumer underneath. > > The code underneath could just as easily work with pfns, and the code > above got those pages from somewhere else, so it doesn't _need_ the bio > for access to those pages/folios (it would be a lot of refactoring > though). > > But I've been thinking about going in a different direction - what if we > unified iov_iter and bio? We've got ~3 different scatter-gather types > that an IO passes through down the stack, and it would be lovely if we > could get it down to just one; e.g. for DIO, pinning pages right at the > copy_from_user boundary. Yes, but ... One of the things that Xen can do and Linux can't is I/O to/from memory that doesn't have an associated struct page. We have all kinds of hacks in place to get around that right now, and I'd like to remove those. Since we want that kind of memory (lets take, eg, GPU memory as an example) to be mappable to userspace, and we want to be able to do DIO to that memory, that points us to using a non-page-based structure right from the start. Yes, if it happens to be backed by pages we need to 'pin' them in some way (I'd like to get away from per-page or even per-folio pinning, but we'll see about that), but the data structure that we use to represent that memory as it moves through the I/O subsystem needs to be physical address based. So my 40,000 foot view is that we do something like get_user_phyrs() at the start of DIO, pas the phyr to the filesystem; the filesystem then passes one or more phyrs to the block layer, the block layer gives the phyrs to the driver which DMA maps the phyr. Yes, the IO completion path (for buffered IO) needs to figure out which folios are decsribed by this phyr, but that's a phys_to_folio() call away.