From: Christoph Hellwig <hch@xxxxxx> Sent: Monday, August 30, 2021 5:01 AM > > Sorry for the delayed answer, but I look at the vmap_pfn usage in the > previous version and tried to come up with a better version. This > mostly untested branch: > > http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/hyperv-vmap > > get us there for swiotlb and the channel infrastructure I've started > looking at the network driver and didn't get anywhere due to other work. > > As far as I can tell the network driver does gigantic multi-megabyte > vmalloc allocation for the send and receive buffers, which are then > passed to the hardware, but always copied to/from when interacting > with the networking stack. Did I see that right? Are these big > buffers actually required unlike the normal buffer management schemes > in other Linux network drivers? > > If so I suspect the best way to allocate them is by not using vmalloc > but just discontiguous pages, and then use kmap_local_pfn where the > PFN includes the share_gpa offset when actually copying from/to the > skbs. As a quick overview, I think there are four places where the shared_gpa_boundary must be applied to adjust the guest physical address that is used. Each requires mapping a corresponding virtual address range. Here are the four places: 1) The so-called "monitor pages" that are a core communication mechanism between the guest and Hyper-V. These are two single pages, and the mapping is handled by calling memremap() for each of the two pages. See Patch 7 of Tianyu's series. 2) The VMbus channel ring buffers. You have proposed using your new vmap_phys_range() helper, but I don't think that works here. More details below. 3) The network driver send and receive buffers. vmap_phys_range() should work here. 4) The swiotlb memory used for bounce buffers. vmap_phys_range() should work here as well. Case #2 above does unusual mapping. The ring buffer consists of a ring buffer header page, followed by one or more pages that are the actual ring buffer. The pages making up the actual ring buffer are mapped twice in succession. For example, if the ring buffer has 4 pages (one header page and three ring buffer pages), the contiguous virtual mapping must cover these seven pages: 0, 1, 2, 3, 1, 2, 3. The duplicate contiguous mapping allows the code that is reading or writing the actual ring buffer to not be concerned about wrap-around because writing off the end of the ring buffer is automatically wrapped-around by the mapping. The amount of data read or written in one batch never exceeds the size of the ring buffer, and after a batch is read or written, the read or write indices are adjusted to put them back into the range of the first mapping of the actual ring buffer pages. So there's method to the madness, and the technique works pretty well. But this kind of mapping is not amenable to using vmap_phys_range(). Michael