Re: "Fixes" for page flipping under PRIME on AMD & nouveau

Mario Kleiner <mario.kleiner.de@xxxxxxxxx> · Fri, 26 Aug 2016 22:07:17 +0200

On 08/18/2016 04:32 AM, Michel Dänzer wrote:
On 18/08/16 08:51 AM, Mario Kleiner wrote:

That's what the ati-ddx/amdgpu-ddx does at the moment, as it detects the
mismatch in tiling flags and uses the DRI3/Present copy path instead of
the pageflip path. The problem is that the servers Present
implementation doesn't request a vsync'ed start of the copy operation [...]

It waits for vblank before starting the copy.

Yes, a vblank event triggers the present_execute in the server. But all 
the latency from vblank event dispatch to the copy command packet 
hitting the gpu is still way too bad to avoid tearing. I tried again and 
couldn't find a single intel/amd/nvidia gpu here that doesn't tear more 
or less badly depending on load with DRI3/Present Copyswaps. Even 
tearfree wouldn't be good enough for my kind of applications as crucial 
timing/timestamps could still be off frequently by at least 1 frame.

There is this other approach from NVidia's Alex Goins for their
proprietary driver, whose patches landed in the X-Server 1.19 master
branch a couple of weeks ago. I haven't read his patches in detail yet,
and i so far couldn't successfully test them with the reference
implementation in modesetting ddx 1.19. Afaik there the display gpu
exports a pair of scanout friendly, page flipping compatible dmabufs (i
assume linear, contiguous, accessible by the display engines),

FWIW, that wouldn't be possible with our "older" GPUs which can't scan
out from GTT: A BO can be either shared with another GPU or scanout
friendly, not both at the same time.

Ok, good to know.

and the offload gpu imports those and renders into them. That saves
one extra copy, so should be somewhat more efficient.

Using two shared buffers actually isn't as efficient as possible wrt
inter-GPU bandwidth.

Out of interest, why? You'd have only one detiling copy VRAM -> RAM? Or 
is it about switching some kind of GTT mappings with two buffers that is 
inefficient?

Setting it up seems to be more involved and less flexible though. So far
i couldn't make it work here for testing. Maybe bugs, maybe mistakes on
my side, maybe i just have the wrong hardware for it.

Yeah, my impression has been it's a rather complicated solution geared
towards the Intel iGPU + proprietary nVidia use case.

Setting up output source/output sink is not fun, as i learned now, 
rather clumsy and complex compared to render offload. I hope the real 
thing will come with some fool-proof one-click setup GUI, otherwise i 
don't have great hopes, given the technical skill level of my users. I 
still didn't manage to get it working, not even with the new Nvidia 
proprietary beta drivers on a real Optimus laptop.

-mario
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel