Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh

Alex Deucher <alexdeucher@xxxxxxxxx> · Fri, 17 Mar 2023 10:45:48 -0400

On Thu, Mar 16, 2023 at 7:09 PM Stefano Stabellini
<sstabellini@xxxxxxxxxx> wrote:
>
> On Thu, 16 Mar 2023, Juergen Gross wrote:
> > On 16.03.23 14:53, Alex Deucher wrote:
> > > On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@xxxxxxxx> wrote:
> > > >
> > > > On 16.03.23 14:45, Alex Deucher wrote:
> > > > > On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
> > > > > >
> > > > > > On 16.03.2023 00:25, Stefano Stabellini wrote:
> > > > > > > On Wed, 15 Mar 2023, Jan Beulich wrote:
> > > > > > > > On 15.03.2023 01:52, Stefano Stabellini wrote:
> > > > > > > > > On Mon, 13 Mar 2023, Jan Beulich wrote:
> > > > > > > > > > On 12.03.2023 13:01, Huang Rui wrote:
> > > > > > > > > > > Xen PVH is the paravirtualized mode and takes advantage of
> > > > > > > > > > > hardware
> > > > > > > > > > > virtualization support when possible. It will using the
> > > > > > > > > > > hardware IOMMU
> > > > > > > > > > > support instead of xen-swiotlb, so disable swiotlb if
> > > > > > > > > > > current domain is
> > > > > > > > > > > Xen PVH.
> > > > > > > > > >
> > > > > > > > > > But the kernel has no way (yet) to drive the IOMMU, so how can
> > > > > > > > > > it get
> > > > > > > > > > away without resorting to swiotlb in certain cases (like I/O
> > > > > > > > > > to an
> > > > > > > > > > address-restricted device)?
> > > > > > > > >
> > > > > > > > > I think Ray meant that, thanks to the IOMMU setup by Xen, there
> > > > > > > > > is no
> > > > > > > > > need for swiotlb-xen in Dom0. Address translations are done by
> > > > > > > > > the IOMMU
> > > > > > > > > so we can use guest physical addresses instead of machine
> > > > > > > > > addresses for
> > > > > > > > > DMA. This is a similar case to Dom0 on ARM when the IOMMU is
> > > > > > > > > available
> > > > > > > > > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the
> > > > > > > > > corresponding
> > > > > > > > > case is XENFEAT_not_direct_mapped).
> > > > > > > >
> > > > > > > > But how does Xen using an IOMMU help with, as said,
> > > > > > > > address-restricted
> > > > > > > > devices? They may still need e.g. a 32-bit address to be
> > > > > > > > programmed in,
> > > > > > > > and if the kernel has memory beyond the 4G boundary not all I/O
> > > > > > > > buffers
> > > > > > > > may fulfill this requirement.
> > > > > > >
> > > > > > > In short, it is going to work as long as Linux has guest physical
> > > > > > > addresses (not machine addresses, those could be anything) lower
> > > > > > > than
> > > > > > > 4GB.
> > > > > > >
> > > > > > > If the address-restricted device does DMA via an IOMMU, then the
> > > > > > > device
> > > > > > > gets programmed by Linux using its guest physical addresses (not
> > > > > > > machine
> > > > > > > addresses).
> > > > > > >
> > > > > > > The 32-bit restriction would be applied by Linux to its choice of
> > > > > > > guest
> > > > > > > physical address to use to program the device, the same way it does
> > > > > > > on
> > > > > > > native. The device would be fine as it always uses Linux-provided
> > > > > > > <4GB
> > > > > > > addresses. After the IOMMU translation (pagetable setup by Xen), we
> > > > > > > could get any address, including >4GB addresses, and that is
> > > > > > > expected to
> > > > > > > work.
> > > > > >
> > > > > > I understand that's the "normal" way of working. But whatever the
> > > > > > swiotlb
> > > > > > is used for in baremetal Linux, that would similarly require its use
> > > > > > in
> > > > > > PVH (or HVM) aiui. So unconditionally disabling it in PVH would look
> > > > > > to
> > > > > > me like an incomplete attempt to disable its use altogether on x86.
> > > > > > What
> > > > > > difference of PVH vs baremetal am I missing here?
> > > > >
> > > > > swiotlb is not usable for GPUs even on bare metal.  They often have
> > > > > hundreds or megs or even gigs of memory mapped on the device at any
> > > > > given time.  Also, AMD GPUs support 44-48 bit DMA masks (depending on
> > > > > the chip family).
> > > >
> > > > But the swiotlb isn't per device, but system global.
> > >
> > > Sure, but if the swiotlb is in use, then you can't really use the GPU.
> > > So you get to pick one.
> >
> > The swiotlb is used only for buffers which are not within the DMA mask of a
> > device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA mask
> > won't use the swiotlb unless you have a buffer above guest physical address of
> > 16TB (so basically never).
> >
> > Disabling swiotlb in such a guest would OTOH mean, that a device with only
> > 32 bit DMA mask passed through to this guest couldn't work with buffers
> > above 4GB.
> >
> > I don't think this is acceptable.
>
> From the Xen subsystem in Linux point of view, the only thing we need to
> do is to make sure *not* to enable swiotlb_xen (yes "swiotlb_xen", not
> the global swiotlb) on PVH because it is not needed anyway.
>
> I think we should leave the global "swiotlb" setting alone. The global
> swiotlb is not relevant to Xen anyway, and surely baremetal Linux has to
> have a way to deal with swiotlb/GPU incompatibilities.
>
> We just have to avoid making things worse on Xen, and for that we just
> need to avoid unconditionally enabling swiotlb-xen. If the Xen subsystem
> doesn't enable swiotlb_xen/swiotlb, and no other subsystem enables
> swiotlb, then we have a good Linux configuration capable of handling the
> GPU properly.
>
> Alex, please correct me if I am wrong. How is x86_swiotlb_enable set to
> false on native (non-Xen) x86?

In most cases we have an IOMMU enabled and IIRC, TTM has slightly
different behavior for memory allocation depending on whether swiotlb
would be needed or not.

Alex