Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh

Jan Beulich <jbeulich@xxxxxxxx> · Thu, 16 Mar 2023 14:58:49 +0100

On 16.03.2023 14:53, Alex Deucher wrote:
> On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@xxxxxxxx> wrote:
>>
>> On 16.03.23 14:45, Alex Deucher wrote:
>>> On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>>>>
>>>> On 16.03.2023 00:25, Stefano Stabellini wrote:
>>>>> On Wed, 15 Mar 2023, Jan Beulich wrote:
>>>>>> On 15.03.2023 01:52, Stefano Stabellini wrote:
>>>>>>> On Mon, 13 Mar 2023, Jan Beulich wrote:
>>>>>>>> On 12.03.2023 13:01, Huang Rui wrote:
>>>>>>>>> Xen PVH is the paravirtualized mode and takes advantage of hardware
>>>>>>>>> virtualization support when possible. It will using the hardware IOMMU
>>>>>>>>> support instead of xen-swiotlb, so disable swiotlb if current domain is
>>>>>>>>> Xen PVH.
>>>>>>>>
>>>>>>>> But the kernel has no way (yet) to drive the IOMMU, so how can it get
>>>>>>>> away without resorting to swiotlb in certain cases (like I/O to an
>>>>>>>> address-restricted device)?
>>>>>>>
>>>>>>> I think Ray meant that, thanks to the IOMMU setup by Xen, there is no
>>>>>>> need for swiotlb-xen in Dom0. Address translations are done by the IOMMU
>>>>>>> so we can use guest physical addresses instead of machine addresses for
>>>>>>> DMA. This is a similar case to Dom0 on ARM when the IOMMU is available
>>>>>>> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding
>>>>>>> case is XENFEAT_not_direct_mapped).
>>>>>>
>>>>>> But how does Xen using an IOMMU help with, as said, address-restricted
>>>>>> devices? They may still need e.g. a 32-bit address to be programmed in,
>>>>>> and if the kernel has memory beyond the 4G boundary not all I/O buffers
>>>>>> may fulfill this requirement.
>>>>>
>>>>> In short, it is going to work as long as Linux has guest physical
>>>>> addresses (not machine addresses, those could be anything) lower than
>>>>> 4GB.
>>>>>
>>>>> If the address-restricted device does DMA via an IOMMU, then the device
>>>>> gets programmed by Linux using its guest physical addresses (not machine
>>>>> addresses).
>>>>>
>>>>> The 32-bit restriction would be applied by Linux to its choice of guest
>>>>> physical address to use to program the device, the same way it does on
>>>>> native. The device would be fine as it always uses Linux-provided <4GB
>>>>> addresses. After the IOMMU translation (pagetable setup by Xen), we
>>>>> could get any address, including >4GB addresses, and that is expected to
>>>>> work.
>>>>
>>>> I understand that's the "normal" way of working. But whatever the swiotlb
>>>> is used for in baremetal Linux, that would similarly require its use in
>>>> PVH (or HVM) aiui. So unconditionally disabling it in PVH would look to
>>>> me like an incomplete attempt to disable its use altogether on x86. What
>>>> difference of PVH vs baremetal am I missing here?
>>>
>>> swiotlb is not usable for GPUs even on bare metal.  They often have
>>> hundreds or megs or even gigs of memory mapped on the device at any
>>> given time.  Also, AMD GPUs support 44-48 bit DMA masks (depending on
>>> the chip family).
>>
>> But the swiotlb isn't per device, but system global.
> 
> Sure, but if the swiotlb is in use, then you can't really use the GPU.
> So you get to pick one.

Yet that "pick one" then can't be an unconditional disable in the source code.
If there's no way to avoid swiotlb on a per-device basis, then users will need
to be told to arrange for this via command line option when they want to use
the GPU is certain ways.

Jan