Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh

Christian König <christian.koenig@xxxxxxx> · Tue, 21 Mar 2023 19:55:20 +0100

Am 17.03.23 um 15:45 schrieb Alex Deucher:
On Thu, Mar 16, 2023 at 7:09 PM Stefano Stabellini
<sstabellini@xxxxxxxxxx> wrote:
On Thu, 16 Mar 2023, Juergen Gross wrote:
On 16.03.23 14:53, Alex Deucher wrote:
On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@xxxxxxxx> wrote:
On 16.03.23 14:45, Alex Deucher wrote:
On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
On 16.03.2023 00:25, Stefano Stabellini wrote:
On Wed, 15 Mar 2023, Jan Beulich wrote:
On 15.03.2023 01:52, Stefano Stabellini wrote:
On Mon, 13 Mar 2023, Jan Beulich wrote:
On 12.03.2023 13:01, Huang Rui wrote:
Xen PVH is the paravirtualized mode and takes advantage of
hardware
virtualization support when possible. It will using the
hardware IOMMU
support instead of xen-swiotlb, so disable swiotlb if
current domain is
Xen PVH.
But the kernel has no way (yet) to drive the IOMMU, so how can
it get
away without resorting to swiotlb in certain cases (like I/O
to an
address-restricted device)?
I think Ray meant that, thanks to the IOMMU setup by Xen, there
is no
need for swiotlb-xen in Dom0. Address translations are done by
the IOMMU
so we can use guest physical addresses instead of machine
addresses for
DMA. This is a similar case to Dom0 on ARM when the IOMMU is
available
(see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the
corresponding
case is XENFEAT_not_direct_mapped).
But how does Xen using an IOMMU help with, as said,
address-restricted
devices? They may still need e.g. a 32-bit address to be
programmed in,
and if the kernel has memory beyond the 4G boundary not all I/O
buffers
may fulfill this requirement.
In short, it is going to work as long as Linux has guest physical
addresses (not machine addresses, those could be anything) lower
than
4GB.

If the address-restricted device does DMA via an IOMMU, then the
device
gets programmed by Linux using its guest physical addresses (not
machine
addresses).

The 32-bit restriction would be applied by Linux to its choice of
guest
physical address to use to program the device, the same way it does
on
native. The device would be fine as it always uses Linux-provided
<4GB
addresses. After the IOMMU translation (pagetable setup by Xen), we
could get any address, including >4GB addresses, and that is
expected to
work.
I understand that's the "normal" way of working. But whatever the
swiotlb
is used for in baremetal Linux, that would similarly require its use
in
PVH (or HVM) aiui. So unconditionally disabling it in PVH would look
to
me like an incomplete attempt to disable its use altogether on x86.
What
difference of PVH vs baremetal am I missing here?
swiotlb is not usable for GPUs even on bare metal.  They often have
hundreds or megs or even gigs of memory mapped on the device at any
given time.  Also, AMD GPUs support 44-48 bit DMA masks (depending on
the chip family).
But the swiotlb isn't per device, but system global.
Sure, but if the swiotlb is in use, then you can't really use the GPU.
So you get to pick one.
The swiotlb is used only for buffers which are not within the DMA mask of a
device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA mask
won't use the swiotlb unless you have a buffer above guest physical address of
16TB (so basically never).

Disabling swiotlb in such a guest would OTOH mean, that a device with only
32 bit DMA mask passed through to this guest couldn't work with buffers
above 4GB.

I don't think this is acceptable.
 From the Xen subsystem in Linux point of view, the only thing we need to
do is to make sure *not* to enable swiotlb_xen (yes "swiotlb_xen", not
the global swiotlb) on PVH because it is not needed anyway.

I think we should leave the global "swiotlb" setting alone. The global
swiotlb is not relevant to Xen anyway, and surely baremetal Linux has to
have a way to deal with swiotlb/GPU incompatibilities.

We just have to avoid making things worse on Xen, and for that we just
need to avoid unconditionally enabling swiotlb-xen. If the Xen subsystem
doesn't enable swiotlb_xen/swiotlb, and no other subsystem enables
swiotlb, then we have a good Linux configuration capable of handling the
GPU properly.

Alex, please correct me if I am wrong. How is x86_swiotlb_enable set to
false on native (non-Xen) x86?
In most cases we have an IOMMU enabled and IIRC, TTM has slightly
different behavior for memory allocation depending on whether swiotlb
would be needed or not.

Well "slightly different" is an understatement. We need to disable quite 
a bunch of features to make swiotlb work with GPUs.

Especially userptr and inter device sharing won't work any more.

Regards,
Christian.

Alex