On Thu, 16 Mar 2023, Juergen Gross wrote: > On 16.03.23 14:53, Alex Deucher wrote: > > On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@xxxxxxxx> wrote: > > > > > > On 16.03.23 14:45, Alex Deucher wrote: > > > > On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@xxxxxxxx> wrote: > > > > > > > > > > On 16.03.2023 00:25, Stefano Stabellini wrote: > > > > > > On Wed, 15 Mar 2023, Jan Beulich wrote: > > > > > > > On 15.03.2023 01:52, Stefano Stabellini wrote: > > > > > > > > On Mon, 13 Mar 2023, Jan Beulich wrote: > > > > > > > > > On 12.03.2023 13:01, Huang Rui wrote: > > > > > > > > > > Xen PVH is the paravirtualized mode and takes advantage of > > > > > > > > > > hardware > > > > > > > > > > virtualization support when possible. It will using the > > > > > > > > > > hardware IOMMU > > > > > > > > > > support instead of xen-swiotlb, so disable swiotlb if > > > > > > > > > > current domain is > > > > > > > > > > Xen PVH. > > > > > > > > > > > > > > > > > > But the kernel has no way (yet) to drive the IOMMU, so how can > > > > > > > > > it get > > > > > > > > > away without resorting to swiotlb in certain cases (like I/O > > > > > > > > > to an > > > > > > > > > address-restricted device)? > > > > > > > > > > > > > > > > I think Ray meant that, thanks to the IOMMU setup by Xen, there > > > > > > > > is no > > > > > > > > need for swiotlb-xen in Dom0. Address translations are done by > > > > > > > > the IOMMU > > > > > > > > so we can use guest physical addresses instead of machine > > > > > > > > addresses for > > > > > > > > DMA. This is a similar case to Dom0 on ARM when the IOMMU is > > > > > > > > available > > > > > > > > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the > > > > > > > > corresponding > > > > > > > > case is XENFEAT_not_direct_mapped). > > > > > > > > > > > > > > But how does Xen using an IOMMU help with, as said, > > > > > > > address-restricted > > > > > > > devices? They may still need e.g. a 32-bit address to be > > > > > > > programmed in, > > > > > > > and if the kernel has memory beyond the 4G boundary not all I/O > > > > > > > buffers > > > > > > > may fulfill this requirement. > > > > > > > > > > > > In short, it is going to work as long as Linux has guest physical > > > > > > addresses (not machine addresses, those could be anything) lower > > > > > > than > > > > > > 4GB. > > > > > > > > > > > > If the address-restricted device does DMA via an IOMMU, then the > > > > > > device > > > > > > gets programmed by Linux using its guest physical addresses (not > > > > > > machine > > > > > > addresses). > > > > > > > > > > > > The 32-bit restriction would be applied by Linux to its choice of > > > > > > guest > > > > > > physical address to use to program the device, the same way it does > > > > > > on > > > > > > native. The device would be fine as it always uses Linux-provided > > > > > > <4GB > > > > > > addresses. After the IOMMU translation (pagetable setup by Xen), we > > > > > > could get any address, including >4GB addresses, and that is > > > > > > expected to > > > > > > work. > > > > > > > > > > I understand that's the "normal" way of working. But whatever the > > > > > swiotlb > > > > > is used for in baremetal Linux, that would similarly require its use > > > > > in > > > > > PVH (or HVM) aiui. So unconditionally disabling it in PVH would look > > > > > to > > > > > me like an incomplete attempt to disable its use altogether on x86. > > > > > What > > > > > difference of PVH vs baremetal am I missing here? > > > > > > > > swiotlb is not usable for GPUs even on bare metal. They often have > > > > hundreds or megs or even gigs of memory mapped on the device at any > > > > given time. Also, AMD GPUs support 44-48 bit DMA masks (depending on > > > > the chip family). > > > > > > But the swiotlb isn't per device, but system global. > > > > Sure, but if the swiotlb is in use, then you can't really use the GPU. > > So you get to pick one. > > The swiotlb is used only for buffers which are not within the DMA mask of a > device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA mask > won't use the swiotlb unless you have a buffer above guest physical address of > 16TB (so basically never). > > Disabling swiotlb in such a guest would OTOH mean, that a device with only > 32 bit DMA mask passed through to this guest couldn't work with buffers > above 4GB. > > I don't think this is acceptable. >From the Xen subsystem in Linux point of view, the only thing we need to do is to make sure *not* to enable swiotlb_xen (yes "swiotlb_xen", not the global swiotlb) on PVH because it is not needed anyway. I think we should leave the global "swiotlb" setting alone. The global swiotlb is not relevant to Xen anyway, and surely baremetal Linux has to have a way to deal with swiotlb/GPU incompatibilities. We just have to avoid making things worse on Xen, and for that we just need to avoid unconditionally enabling swiotlb-xen. If the Xen subsystem doesn't enable swiotlb_xen/swiotlb, and no other subsystem enables swiotlb, then we have a good Linux configuration capable of handling the GPU properly. Alex, please correct me if I am wrong. How is x86_swiotlb_enable set to false on native (non-Xen) x86?