Passthrough device memory throughput issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

I have been working on making LookingGlass (LG) (https://github.com/gnif/LookingGlass) viable to use inside a Linux guest for Windows -> Linux frame streaming. Currently LG works very well when running native on the host streaming frames from a Windows guest via an IVSHMEM device, however when the client is running inside a Linux guest we seem to be hitting a memory performance wall.

Before I continue here is my hardware configuration:

  ThreadRipper 1950X in NUMA mode.
  GeForce 1080Ti passed through to a Windows guest
  AMD Vega 56 passed through to a Linux guest

Both Windows and Linux guests are bound to the same node as are their memory allocations. Memory copy performance in the Linux guest matches native memory copy performance at ~40GB/s. Windows copy performance is slower seeming to hit a wall at ~14GB/s, slower then I would have expected but thats is a separate issue, suffice to say it's plenty fast enough for what I am trying to accomplish here.

Windows is feeding captured frames at 1920x1200 @ 32bpp into the IVSHMEM virtual device, which the Linux guest is using as it's input. The data transfer rate is matching that above of ~14GB/s, allowing for in theory over 1,600 frames per second. But when I take this buffer and try to feed it to the AMD Vega, I see an abysmal transfer rate of ~131MB/s (~ 15fps). Copying the shared memory into an intermediate buffer before feeding the data to the GPU doesn't make a difference.

Now one might thing the rendering code is at fault here (and it might be), however if I instead dynamically create the buffer for each frame I do not see any performance issues, for example, the below will generate a vertical repeating gradient that is horizontally scrolling.

static int offset = 0;
char * data = malloc(frameSize);
for(int i = 0; i < frameSize; ++i)
  data[i] = i + offset;
++offset;
render(data, frameSize);
free(data);

Is there something I am overlooking with regards to this buffer transfer
Could DMA be rejecting the buffer because it's not in system ram but a memory mapped BAR? and if so, why doesn't copying it to an intermediate buffer first help?

Thank you for your time,
-Geoff



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux