Re: Design session notes: GPU acceleration in Xen

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 18.06.24 um 16:12 schrieb Demi Marie Obenour:
On Tue, Jun 18, 2024 at 08:33:38AM +0200, Christian König wrote:
> Am 18.06.24 um 02:57 schrieb Demi Marie Obenour:
>> On Mon, Jun 17, 2024 at 10:46:13PM +0200, Marek Marczykowski-Górecki
>> wrote:
>>> On Mon, Jun 17, 2024 at 09:46:29AM +0200, Roger Pau Monné wrote:
>>>> On Sun, Jun 16, 2024 at 08:38:19PM -0400, Demi Marie Obenour wrote:
>>>>> In both cases, the device physical
>>>>> addresses are identical to dom0’s physical addresses.
>>>>
>>>> Yes, but a PV dom0 physical address space can be very scattered.
>>>>
>>>> IIRC there's an hypercall to request physically contiguous memory for
>>>> PV, but you don't want to be using that every time you allocate a
>>>> buffer (not sure it would support the sizes needed by the GPU
>>>> anyway).
>>
>>> Indeed that isn't going to fly. In older Qubes versions we had PV
>>> sys-net with PCI passthrough for a network card. After some uptime it
>>> was basically impossible to restart and still have enough contagious
>>> memory for a network driver, and there it was about _much_ smaller
>>> buffers, like 2M or 4M. At least not without shutting down a lot more
>>> things to free some more memory.
>>
>> Ouch!  That makes me wonder if all GPU drivers actually need physically
>> contiguous buffers, or if it is (as I suspect) driver-specific. CCing
>> Christian König who has mentioned issues in this area.

> Well GPUs don't need physical contiguous memory to function, but if they
> only get 4k pages to work with it means a quite large (up to 30%)
> performance penalty.

The status quo is "no GPU acceleration at all", so 70% of bare metal
performance would be amazing right now.

Well AMD uses native context approach in XEN which which delivers over 90% of bare metal performance.

Pierre-Eric can tell you more, but we certainly have GPU solutions in productions with XEN which would suffer greatly if we see the underlying memory fragmented like this.

  However, the implementation
should not preclude eliminating this performance penalty in the future.

What size pages do GPUs need for good performance?  Is it the same as
CPU huge pages?

2MiB are usually sufficient.

Regards,
Christian.

  PV dom0 doesn't get huge pages at all, but PVH and HVM
guests do, and the goal is to move away from PV guests as they have lots
of unrelated problems.

> So scattering memory like you described is probably a very bad idea if you
> want any halve way decent performance.

For an initial prototype a 30% performance penalty is acceptable, but
it's good to know that memory fragmentation needs to be avoided.

> Regards,
> Christian

Thanks for the prompt response!




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux