Re: [PATCH 00/12] Set of patches for further support of VSync

Frediano Ziglio <fziglio@xxxxxxxxxx> · Tue, 28 Mar 2017 13:27:28 -0400 (EDT)

On Tue, Mar 28, 2017 at 5:14 PM, Frediano Ziglio <fziglio@xxxxxxxxxx> wrote:
>

> The main goal is to reduce time in GDI callback (PresentDisplayOnly) and

> avoid

> situation when the processing takes more than 2 seconds causing class driver

> watchdog.

>

> 1. We offload sending of drawable commands to separate thread (waiting for

> room in command ring

>    may take unpredictable time)

> 2. In case the usage of device memory is high, allocation of bitmap for

> rectangle to draw

>    also may take unpredictable time (note that single full screen redraw

>    requires >3 MB of space)

>    So, we make drawable objects allocation from GDI callaback fast and

>    non-forced and in case they

>    fail we provide alternate allocation from OS heap

> 3. The thread before send drawable command shall take care on these objects

> that was allocated from

>    OS heap and allocate them from device memory (now we are not limited by

>    time)

> 4. We still do not enable VSync automatically, but this can be done for

> evaluation/testing purpose via

>    setting in the driver's registry

>

A big issue of this approach is that it does not entirely solve

the problem but move it.

We can't spend too much time waiting for memory in OS callback.
In our own thread our wait can be as long as we want.

Instead of waiting for device memory we fallback to system one

and when we can send commands we copy back to device memory and

send it increasing system memory usage and memory copies.

Yes, that's correct. Our processing in OS callback must be fast and I do not see how we can
solve it without using host memory and without skipping operation.

However I cannot see any limitation so potentially we'll fill

system memory till the guest crash. And if we add a limitation

potentially this will just move the hang to later.

We allocate pageable memory which is much less limited than non-pageable 
and typical amount of available pageable memory is > 1G
When working in LAN environment, there is rare cases when we need to allocate host memory.
With long end-to-end delay under heavy scenarios I did not see huge amount of outstanding allocation.

As far as I know we always have available 3 times the amount

of memory of the maximum frame buffer to in theory plenty of

space. But trying to see the drawing from the client I can see

lot of redrawing of the same area again and again so maybe

this is causing the issues with the memory.

Maybe we can find a smarter way to solve this memory issue?

I would suggest to look for possible improvements later.
I have some ideas but they do not invalidate current solution.
I was talking with Jonathon about different memory layout of different drivers.
Turns out that this "new" DOD driver uses a different layout from previous Windows
driver. Exactly Bar0 (DEVRAM) is used for frame buffer and monitor configs
while everything else is in VRAM. Previous Windows drivers used VRAM only
for off screen surfaces (so basically was always using DEVRAM). But according to
our data and from http://www.ovirt.org/documentation/draft/video-ram/ the allocation
for VRAM can be really small (8Mb) which is not good for WDDM driver.

Time ago for Linux I proposed a patch that basically has a kind of fallback for memory
allocations. If it failed allocating on one bar the other was tried (and deallocations of course
detected the bar used based on the pointer). I think would make sense to try such
a strategy even to make guest system upgrades easier. Writing a patch.

> Yuri Benditovich (12):

>   qxl-wddm-dod: Prepare system thread for rendering

>   qxl-wddm-dod: Use rendering offload thread

>   qxl-wddm-dod: Introduce TimeMeasurement class for timing debugging

>   qxl-wddm-dod: Debug warning on long wait on event

>   qxl-wddm-dod: Reduce amount of unnecessary printouts

>   qxl-wddm-dod: Registry-based control over VSync

>   qxl-wddm-dod: Set VSync indication period to 200ms

>   qxl-wddm-dod: Prepare for failure to allocate memory

>   qxl-wddm-dod: PutBytesAlign supports non-forced allocation

>   qxl-wddm-dod: Optimize allocation of memory chunks

>   qxl-wddm-dod: Implement non-forced bitmap allocation

>   qxl-wddm-dod: Non-forced memory allocations with VSync

>

>  qxldod/QxlDod.cpp | 581

>  +++++++++++++++++++++++++++++++++++++++++++++---------

>  qxldod/QxlDod.h   |  87 +++++++-

>  qxldod/driver.cpp |  35 ++++

>  3 files changed, 606 insertions(+), 97 deletions(-)

>
Frediano

_______________________________________________
Spice-devel mailing list
Spice-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/spice-devel