As already stated by Javier the difference was not the CPU usage but user experience.
I think that mainly the Windows engine is free to do its own job while our driver complete
the operation. Looking at Windows calls to our driver looks like Windows keep another
frame buffer and periodically (didn't try to understand when exactly or how often) pass
dirty rects to update the card (QXL in this case) frame buffer.
I have the feeling (don't know how to verify) that if the driver returns STATUS_PENDING
Windows will collapse dirty rects while the driver is doing its job basically decreasing calls
to our driver but doing the proper frame buffer updates.
Frediano
I think that mainly the Windows engine is free to do its own job while our driver complete
the operation. Looking at Windows calls to our driver looks like Windows keep another
frame buffer and periodically (didn't try to understand when exactly or how often) pass
dirty rects to update the card (QXL in this case) frame buffer.
I have the feeling (don't know how to verify) that if the driver returns STATUS_PENDING
Windows will collapse dirty rects while the driver is doing its job basically decreasing calls
to our driver but doing the proper frame buffer updates.
Frediano
Hello Javier,After implementing the pushing thread in current qxl-wddm-dod I measure the CPU consumption in PresentDisplayOnly call and in the thread on pushing drawables to the device. The results show that the time to push drawables is negligible in relation to time of copying dirty rects to the device memory (in average the proportion is ~ 1/500) and in typical case the pushing thread serves only single 'present' operation and then waits for next operation.I tried in on Win10 with 2-3 G memory and 1-2 CPUs with regular user activity (opening windows, redrawing, scrolling etc)Do I miss something?Thanks,YuriOn Fri, Nov 25, 2016 at 11:11 AM, Javier Celaya <javier.celaya@xxxxxxxxxxx> wrote:Hello YuriEl vie, 25-11-2016 a las 01:08 +0200, Yuri Benditovich escribió:I'm porting to [qxl-wddm-dod] set of flexvdi changesrelated to execution of 'present display only' eventsin separate thread. There are 2 questions below I'd like to ask and know your opinion.I see there 2 aspects:- reliability- performanceReliability:I see in flexvdi mailing list existing report ofBSOD upon system shutdown. Possible cause is lack ofsynchronization between system flows, hardware availability and worker thread state (last patch in flexvdi 'Terminate working thread on exit' introduces termination procedure but nobody calls it, as I can see)The lack of synchronization may cause also races inpower management flows and (possible) on changingoperating mode.Question 1:Do you have some additional recommendation whichflows shall be specially checked for races withrendering thread?Unfortunately, the truth is, we have not thoroughly tested our code to remove these races yet. The clients this driver was intended for are still stuck using Windows XP/7, and our development is stalled. So, I cannot think of any situation you should check that you do not know about yet.Performance:It looks like the change should not affect total CPU consumption forthe rendering, it splits more or less the same operations over2 different threads. It is still possible that the change can improvecommon user experience due to faster indication of operation completion to the OS.We were not trying to reduce total CPU consumption. After all, the driver just copies rects from main memory to VRAM and passes them to the spice server; there is little to reduce there. Rather, we tried to increase the throughput of graphic operations, by not locking the DirectX subsystem while we wait for the spice server to accept new drawables. That is, we do not mind using more CPU if that results in painting faster.On the other hand, I was thinking that maybe we could get the DirectX subsystem to provide the rects already in VRAM if we described it as a linear memory segment on driver initialization. In that way, the copying operation could also be removed. However, I am not sure if this actually works or even how to do it, it is just an idea.Question 2:Do you have some ideas how to make quantitiveevaluation of this possible improvement of user experience?I think about:- finding scenarios when we receive rendering calls (PresentDisplayOnly) when the worker thread is still processing previous operation. If they exist this can mean that some bottleneck solved in GDI.- writing or getting tool that loads the graphicsadapter by heavy operations (like continuos moving of window / scrolling etc) with CPU consumption measurementWe used a simple tool to measure the performance: it creates a window and continuously issues WM_PAINT events where the full background is filled with color, then measures the number of processed events per second (not CPU). It is quite naive, but it provides a good starting reference, since the tool, with the XDDM QXL driver in Windows 7, outputs almost twice as much paint events as executing it in Windows 8 with the WDDM QXL driver. There are other measurements you can try to obtain, like how much time does it take until a paint event gets to the spice server queue, ready to be sent to the client (although I'm not sure how to measure it). This delay affects the user perception of performance.Please share your thoughts.Thanks,Yuri
_______________________________________________
Spice-devel mailing list
Spice-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/spice-devel
_______________________________________________ Spice-devel mailing list Spice-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/spice-devel