On Tue, Jun 25, 2019 at 01:54:21PM -0600, Logan Gunthorpe wrote: > Well whether it's dma_addr_t, phys_addr_t, pfn_t the result isn't all > that different. You still need roughly the same 'if' hooks for any > backed memory that isn't in the linear mapping and you can't get a > kernel mapping for directly. > > It wouldn't be too hard to do a similar patch set that uses something > like phys_addr_t instead and have a request and queue flag for support > of non-mappable memory. But you'll end up with very similar 'if' hooks > and we'd have to clean up all bio-using drivers that access the struct > pages directly. We'll need to clean that mess up anyway, and I've been chugging along doing some of that. A lot still assume no highmem, so we need to convert them over to something that kmaps anyway. If we get the abstraction right that will actually help converting over to a better reprsentation. > Though, we'd also still have the problem of how to recognize when the > address points to P2PDMA and needs to be translated to the bus offset. > The map-first inversion was what helped here because the driver > submitting the requests had all the information. Though it could be > another request flag and indicating non-mappable memory could be a flag > group like REQ_NOMERGE_FLAGS -- REQ_NOMAP_FLAGS. The assumes the request all has the same memory, which is a simplifing assuption. My idea was that if had our new bio_vec like this: struct bio_vec { phys_addr_t paddr; // 64-bit on 64-bit systems unsigned long len; }; we have a hole behind len where we could store flag. Preferably optionally based on a P2P or other magic memory types config option so that 32-bit systems with 32-bit phys_addr_t actually benefit from the smaller and better packing structure. > If you think any of the above ideas sound workable I'd be happy to try > to code up another prototype. Іt sounds workable. To some of the first steps are cleanups independent of how the bio_vec is eventually going to look like. That is making the DMA-API internals work on the phys_addr_t, which also unifies the map_resource implementation with map_page. I plan to do that relatively soon. The next is sorting out access to bios data by virtual address. All these need nice kmapping helper that avoid too much open coding. I was going to look into that next, mostly to kill the block layer bounce buffering code. Similar things will also be needed at the scatterlist level I think. After that we need to more audits of how bv_page is still used. something like a bv_phys() helper that does "page_to_phys(bv->bv_page) + bv->bv_offset" might come in handy for example.