Afaiu the prime importing display gpu generates its own gem buffer handle (prime_fd_to_handle) from that dmabuf, importing scather-gather tables to access the dmabuf in system ram. As far as page flipping is concerned, so far those gem buffers / radeon_bo's aren't treated any different than native ones. During pageflip setup they get pinned into VRAM, which moves (=copies) their content from the RAM dmabuf backing store into VRAM.
Your understanding isn't correct. Buffers imported using prime always stay in GTT, they can't be moved to VRAM.
It's the DDX which copies the buffer content from the imported prime handle into a native on which is enabled to scan out.
Regards, Christian. Am 18.08.2016 um 01:29 schrieb Mario Kleiner:
On 08/17/2016 07:02 PM, Christian König wrote:Am 17.08.2016 um 18:35 schrieb Mario Kleiner:On 08/17/2016 06:27 PM, Christian König wrote:Well I'm not an expert on this, but as far as I know the bigger problem is that the dedicated AMD hardware generations you are targeting usuallyAMD uses copy swaps because radeon/amdgpu kms can't switch the scanout mode from tiled to linear on the fly during flips.can't reliable scanout from system memory without a rather complicated setup. So that is a complete NAK to the radeon changes.Hi Christian, thanks for the feedback, but i think that's a misunderstanding. The patches don't make them scanout from system memory, they just enforce a fresh copy from RAM/GTT -> VRAM before scanning out a buffer again. I just assume there is a more elegant/clean way than this "fake" pin/unpin to GTT to essentially tell the driver that its current VRAM content is stale and needs a refresh from the up to date dmabuf in system RAM.I was already wondering how the heck you got that working. What do you mean with a fresh copy from GTT to VRAM? A buffer exported by DMA-buf should never move as long as it is exported, same for a buffer pinned to VRAM.Under DRI3/Present, the way it is currently implemented in the X-Server and Mesa, the display gpu (= normally integrated one) is importing the dma-buf that was exported by the render offload gpu. So the actual dmabuf doesn't move, but just stays where it is in system RAM.Afaiu the prime importing display gpu generates its own gem buffer handle (prime_fd_to_handle) from that dmabuf, importing scather-gather tables to access the dmabuf in system ram. As far as page flipping is concerned, so far those gem buffers / radeon_bo's aren't treated any different than native ones. During pageflip setup they get pinned into VRAM, which moves (=copies) their content from the RAM dmabuf backing store into VRAM. Then they get flipped and scanned out as usual. The disconnect happens when such a buffer gets flipped off the scanout (and unpinned) and later on page-flipped to the scanout again. Now the driver just reuses the bo that still likely resides in VRAM (although not pinned anymore) and forgets that it was associated with some dmabuf backing in RAM which may have updated visual content. So the exporting renderoffload gpu happily renders new frames into the dmabuf in ram, while radeon kms happily displays stale frames from its own copy in VRAM.So using a DMA-buf for scanout is impossible and actually not valuable cause is shouldn't matter if we copy from GTT to VRAM because of a buffer migration or because of a copy triggered by the DDX. What are you actually trying to do here?Make a typical Enduro laptop with an AMD iGPU + AMD dGPU work under DRI3/Present, without tearing and other ugliness, e.g.,DRI_PRIME=1 glxgears -fullscreen -> discrete gpu renders, integrated gpu displays the rendered frames.Currently the drivers use copies for handling the PresentPixmap requests, which sort of works in showing the right pictures, but gives bad tearing and undefined timing. With copies we are too slow to keep ahead of the scanout and Present doesn't even guarantee that the copy starts vsync'ed. So at all levels, from delays in the x-server, mesa's way of doing things, commmand submission and the hw itself we end up blitting in the middle of scanout. And the presentation timing isn't ever trustworthy for timing sensitive applications unless we present via page flipping.The hack in my patch tricks the driver into migrating the bo back to GTT (skipping the actual pointless data copy though) and then back into VRAM to force a copy of fresh content from the imported dmabuf into VRAM, so page flipping flips up to date content into the scanout.-marioRegards, Christian.Btw. i'll be offline for the next few hours, just wanted to get this out now. thanks, -marioRegards, Christian. Am 17.08.2016 um 18:12 schrieb Mario Kleiner:Hi, i spent some time playing with DRI3/Present + PRIME for testing how well it works for Optimus/Enduro style setups wrt. page flipping on the current kernel/mesa/xorg. I want page flipping, becauseneuroscience/medical applications need the reliable timing/timestampingand tear free presentation we currently only can get via page flipping, but not the copyswap path. Intel as display gpu + nouveau for render offload worked nicely on intel-ddx with page flipping, proper timing, dmabuf fence sync and all. AMD uses copy swaps because radeon/amdgpu kms can't switch the scanout mode from tiled to linear on the fly during flips. That's a todo in itself. For the moment i used the ati-ddx with Option "ColorTiling/ColorTiling2D" "off" to force my pair of old Radeon HD-5770's into linear mode so page flipping can be used for prime. The current modesetting-ddx will use page flipping in any case as it doesn't detect the tiling format mismatch. nouveau uses page flips. Turns out that prime + page flipping currently doesn't work on nouveau and amd. The first offload rendered images from the imported dmabufs show up properly, but then the display is stuck alternating between the first two or three rendered frames. The problem is that during the pageflip ioctl we pin the dmabuf into VRAM in preparation for scanout, then unpin it when we are done with it at next flip, but the buffer stays in the VRAM memory domain. Next time we flip to the buffer again, the driver skips the DMA copy from GTT to VRAM during pinning, because the buffers content apparently already resides in VRAM. Therefore it doesn't update the VRAM copy with the updated dmabuf content in system RAM, so freshly rendered frames from the prime export/render offload gpu never reach the display gpu and one only sees stale images. The attached patches for nouveau and radeon kms seem to work pretty ok, page flipping works, display updates, tear-free, dmabuf fence sync works, onset timing/timestamping is correct. They simply pin the buffer back into GTT, then unpin, to force a move of the buffer into the GTT domain, and thereby force the following pin to do a new copy from GTT -> VRAM. The code tries to avoid a useless copy from VRAM -> GTT during the pin op. However, the approach feels very much like a hack, so i assume this is not the proper way of doing it? I looked what ttm has to offer, but couldn't find anything elegant and obvious. Maybe there is a way to evict a bo without actually copying data back to RAM? Or to invalidate the VRAM copy as stale? Maybe i just missed something, as i'm not very familiar with ttm. Thoughts or suggestions? Another insight with my hacks is so far that nouveau seems to be fast as prime exporter/renderoffload, but rather slow as display gpu/prime importer, as tested on a 2008 or 2009 MacBookPro dual-Nvidia laptop. AMD, as tested with dual Radeon HD-5770 seems to be fast as prime importer/display gpu, but very slow as prime exporter/render offload, e.g., taking 16 msecs to get a 1920x1080 framebuffer into RAM. Seems that Mesa's blitImage function is the slow bit here. On r600 it seems to draw a textured triangle strip to detile the gpu renderbuffer and copy it into GTT. As drawing a textured fullscreen quad is normally much faster, something special seems to be going on there wrt. DMA? However, i don't have a realistic real Enduro test setup with AMD iGPU + dGPU, only this cobbled together dual HD-5770's in a MacPro, so this could be wrong. thanks, -mario _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel