> From: Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> > Sent: Monday, November 21, 2022 4:24 PM > > On Mon, Nov 21, 2022 at 11:51:48AM -0800, Dexuan Cui wrote: > > When a TDX guest runs on Hyper-V, the hv_netvsc driver's netvsc_init_buf() > > allocates buffers using vzalloc(), and needs to share the buffers with the > > host OS by calling set_memory_decrypted(), which is not working for > > vmalloc() yet. Add the support by handling the pages one by one. > > Why do you use vmalloc here in the first place? We changed to vmalloc() long ago, mainly for 2 reasons: 1) __alloc_pages() only allows us to allocate up to 4MB of contiguous pages, but we need a 16MB buffer in the Hyper-V vNIC driver for better performance. 2) Due to memory fragmentation, we have seen that the page allocator can fail to allocate 2 contigous pages, though the system has a lot of free memory. We need to support Hyper-V vNIC hot addition, so we changed to vmalloc. See b679ef73edc2 ("hyperv: Add support for physically discontinuous receive buffer") 06b47aac4924 ("Drivers: net-next: hyperv: Increase the size of the sendbuf region") > Will you also adjust direct mapping to have shared bit set? > > If not, we will have problems with load_unaligned_zeropad() when it will > access shared pages via non-shared mapping. > > If direct mapping is adjusted, we will get direct mapping fragmentation. load_unaligned_zeropad() was added 10 years ago by Linus in e419b4cc5856 ("vfs: make word-at-a-time accesses handle a non-existing page") so this seemingly-strange usage is legitimate. Sorry I don't know how to adjust direct mapping. Do you mean I should do something like the below in tdx_enc_status_changed_for_vmalloc() for every 'start_va': pa = slow_virt_to_phys(start_va); set_memory_decrypted(phys_to_virt(pa), 1); ? But IIRC this issue is not specific to vmalloc()? e.g. we get 1 page by __get_free_pages(GFP_KERNEL, 0) or kmalloc(PAGE_SIZE, GFP_KERNEL) and we call set_memory_decrypted() for the page. How can we make sure the callers of load_unaligned_zeropad() can't access the page via non-shared mapping? It looks like you have a patchset to address the issue (it looks like it hasn't been merged into the mainline?) ? https://lwn.net/ml/linux-kernel/20220614120231.48165-11-kirill.shutemov@xxxxxxxxxxxxxxx/ BTW, I'll drop tdx_enc_status_changed_for_vmalloc() and try to enhance the existing tdx_enc_status() to support both direct mapping and vmalloc(). > Maybe tap into swiotlb buffer using DMA API? I doubt the Hyper-V vNIC driver here can call dma_alloc_coherent() to get a 16MB buffer from swiotlb buffers. I'm looking at dma_alloc_coherent() -> dma_alloc_attrs() -> dma_direct_alloc(), which typically calls __dma_direct_alloc_pages() to allocate congituous memory pages (which can't exceed the 4MB limit. Note there is no virtual IOMMU in a guest on Hyper-V). It looks like we can't force dma_direct_alloc() to call dma_direct_alloc_no_mapping(), because even if we set the DMA_ATTR_NO_KERNEL_MAPPING flag, force_dma_unencrypted() is still always true for a TDX guest.