On Wed, Jun 19, 2024 at 12:27 PM Christian König <christian.koenig@xxxxxxx> wrote: > > Am 18.06.24 um 16:12 schrieb Demi Marie Obenour: > > On Tue, Jun 18, 2024 at 08:33:38AM +0200, Christian König wrote: > > > Am 18.06.24 um 02:57 schrieb Demi Marie Obenour: > > >> On Mon, Jun 17, 2024 at 10:46:13PM +0200, Marek Marczykowski-Górecki > > >> wrote: > > >>> On Mon, Jun 17, 2024 at 09:46:29AM +0200, Roger Pau Monné wrote: > > >>>> On Sun, Jun 16, 2024 at 08:38:19PM -0400, Demi Marie Obenour wrote: > > >>>>> In both cases, the device physical > > >>>>> addresses are identical to dom0’s physical addresses. > > >>>> > > >>>> Yes, but a PV dom0 physical address space can be very scattered. > > >>>> > > >>>> IIRC there's an hypercall to request physically contiguous memory for > > >>>> PV, but you don't want to be using that every time you allocate a > > >>>> buffer (not sure it would support the sizes needed by the GPU > > >>>> anyway). > > >> > > >>> Indeed that isn't going to fly. In older Qubes versions we had PV > > >>> sys-net with PCI passthrough for a network card. After some uptime it > > >>> was basically impossible to restart and still have enough contagious > > >>> memory for a network driver, and there it was about _much_ smaller > > >>> buffers, like 2M or 4M. At least not without shutting down a lot more > > >>> things to free some more memory. > > >> > > >> Ouch! That makes me wonder if all GPU drivers actually need physically > > >> contiguous buffers, or if it is (as I suspect) driver-specific. CCing > > >> Christian König who has mentioned issues in this area. > > > > > Well GPUs don't need physical contiguous memory to function, but if they > > > only get 4k pages to work with it means a quite large (up to 30%) > > > performance penalty. > > > > The status quo is "no GPU acceleration at all", so 70% of bare metal > > performance would be amazing right now. > > Well AMD uses native context approach in XEN which which delivers over > 90% of bare metal performance. > > Pierre-Eric can tell you more, but we certainly have GPU solutions in > productions with XEN which would suffer greatly if we see the underlying > memory fragmented like this. > > > However, the implementation > > should not preclude eliminating this performance penalty in the future. > > > > What size pages do GPUs need for good performance? Is it the same as > > CPU huge pages? > > 2MiB are usually sufficient. Larger pages are helpful for both system memory and VRAM, but it's more important for VRAM. Alex > > Regards, > Christian. > > > PV dom0 doesn't get huge pages at all, but PVH and HVM > > guests do, and the goal is to move away from PV guests as they have lots > > of unrelated problems. > > > > > So scattering memory like you described is probably a very bad idea > > if you > > > want any halve way decent performance. > > > > For an initial prototype a 30% performance penalty is acceptable, but > > it's good to know that memory fragmentation needs to be avoided. > > > > > Regards, > > > Christian > > > > Thanks for the prompt response! >