Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh

Stefano Stabellini <sstabellini@xxxxxxxxxx> · Thu, 16 Mar 2023 16:09:44 -0700 (PDT)

On Thu, 16 Mar 2023, Juergen Gross wrote:
> On 16.03.23 14:53, Alex Deucher wrote:
> > On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@xxxxxxxx> wrote:
> > > 
> > > On 16.03.23 14:45, Alex Deucher wrote:
> > > > On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
> > > > > 
> > > > > On 16.03.2023 00:25, Stefano Stabellini wrote:
> > > > > > On Wed, 15 Mar 2023, Jan Beulich wrote:
> > > > > > > On 15.03.2023 01:52, Stefano Stabellini wrote:
> > > > > > > > On Mon, 13 Mar 2023, Jan Beulich wrote:
> > > > > > > > > On 12.03.2023 13:01, Huang Rui wrote:
> > > > > > > > > > Xen PVH is the paravirtualized mode and takes advantage of
> > > > > > > > > > hardware
> > > > > > > > > > virtualization support when possible. It will using the
> > > > > > > > > > hardware IOMMU
> > > > > > > > > > support instead of xen-swiotlb, so disable swiotlb if
> > > > > > > > > > current domain is
> > > > > > > > > > Xen PVH.
> > > > > > > > > 
> > > > > > > > > But the kernel has no way (yet) to drive the IOMMU, so how can
> > > > > > > > > it get
> > > > > > > > > away without resorting to swiotlb in certain cases (like I/O
> > > > > > > > > to an
> > > > > > > > > address-restricted device)?
> > > > > > > > 
> > > > > > > > I think Ray meant that, thanks to the IOMMU setup by Xen, there
> > > > > > > > is no
> > > > > > > > need for swiotlb-xen in Dom0. Address translations are done by
> > > > > > > > the IOMMU
> > > > > > > > so we can use guest physical addresses instead of machine
> > > > > > > > addresses for
> > > > > > > > DMA. This is a similar case to Dom0 on ARM when the IOMMU is
> > > > > > > > available
> > > > > > > > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the
> > > > > > > > corresponding
> > > > > > > > case is XENFEAT_not_direct_mapped).
> > > > > > > 
> > > > > > > But how does Xen using an IOMMU help with, as said,
> > > > > > > address-restricted
> > > > > > > devices? They may still need e.g. a 32-bit address to be
> > > > > > > programmed in,
> > > > > > > and if the kernel has memory beyond the 4G boundary not all I/O
> > > > > > > buffers
> > > > > > > may fulfill this requirement.
> > > > > > 
> > > > > > In short, it is going to work as long as Linux has guest physical
> > > > > > addresses (not machine addresses, those could be anything) lower
> > > > > > than
> > > > > > 4GB.
> > > > > > 
> > > > > > If the address-restricted device does DMA via an IOMMU, then the
> > > > > > device
> > > > > > gets programmed by Linux using its guest physical addresses (not
> > > > > > machine
> > > > > > addresses).
> > > > > > 
> > > > > > The 32-bit restriction would be applied by Linux to its choice of
> > > > > > guest
> > > > > > physical address to use to program the device, the same way it does
> > > > > > on
> > > > > > native. The device would be fine as it always uses Linux-provided
> > > > > > <4GB
> > > > > > addresses. After the IOMMU translation (pagetable setup by Xen), we
> > > > > > could get any address, including >4GB addresses, and that is
> > > > > > expected to
> > > > > > work.
> > > > > 
> > > > > I understand that's the "normal" way of working. But whatever the
> > > > > swiotlb
> > > > > is used for in baremetal Linux, that would similarly require its use
> > > > > in
> > > > > PVH (or HVM) aiui. So unconditionally disabling it in PVH would look
> > > > > to
> > > > > me like an incomplete attempt to disable its use altogether on x86.
> > > > > What
> > > > > difference of PVH vs baremetal am I missing here?
> > > > 
> > > > swiotlb is not usable for GPUs even on bare metal.  They often have
> > > > hundreds or megs or even gigs of memory mapped on the device at any
> > > > given time.  Also, AMD GPUs support 44-48 bit DMA masks (depending on
> > > > the chip family).
> > > 
> > > But the swiotlb isn't per device, but system global.
> > 
> > Sure, but if the swiotlb is in use, then you can't really use the GPU.
> > So you get to pick one.
> 
> The swiotlb is used only for buffers which are not within the DMA mask of a
> device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA mask
> won't use the swiotlb unless you have a buffer above guest physical address of
> 16TB (so basically never).
> 
> Disabling swiotlb in such a guest would OTOH mean, that a device with only
> 32 bit DMA mask passed through to this guest couldn't work with buffers
> above 4GB.
> 
> I don't think this is acceptable.

>From the Xen subsystem in Linux point of view, the only thing we need to
do is to make sure *not* to enable swiotlb_xen (yes "swiotlb_xen", not
the global swiotlb) on PVH because it is not needed anyway.

I think we should leave the global "swiotlb" setting alone. The global
swiotlb is not relevant to Xen anyway, and surely baremetal Linux has to
have a way to deal with swiotlb/GPU incompatibilities.

We just have to avoid making things worse on Xen, and for that we just
need to avoid unconditionally enabling swiotlb-xen. If the Xen subsystem
doesn't enable swiotlb_xen/swiotlb, and no other subsystem enables
swiotlb, then we have a good Linux configuration capable of handling the
GPU properly.

Alex, please correct me if I am wrong. How is x86_swiotlb_enable set to
false on native (non-Xen) x86?