On 23.07.2014 15:42, Christian König wrote: > Am 23.07.2014 05:54, schrieb Michel Dänzer: >> On 21.07.2014 17:07, Christian König wrote: >>> Am 19.07.2014 03:15, schrieb Michel Dänzer: >>>> On 19.07.2014 00:47, Christian König wrote: >>>>> Am 18.07.2014 05:07, schrieb Michel Dänzer: >>>>>>>> [PATCH 5/5] drm/radeon: Use VRAM for indirect buffers on >= SI >>>>>>> I'm still not very keen with this change since I still don't >>>>>>> understand >>>>>>> the reason why it's faster than with GTT. Definitely needs more >>>>>>> testing >>>>>>> on a wider range of systems. >>>>>> Sure. If anyone wants to give this patch a spin and see if they can >>>>>> measure any performance difference, good or bad, that would be >>>>>> interesting. >>>>>> >>>>>>> Maybe limit it to APUs for now? >>>>>> But IIRC, CPU writes to VRAM vs. write-combined GTT are actually an >>>>>> even >>>>>> bigger win with dedicated GPUs than with the Kaveri built-in GPU >>>>>> on my >>>>>> system. I suspect it may depend on the bandwidth available for >>>>>> PCIe vs. >>>>>> system memory though. >>>>> I've made a few tests today with the kernel part of the patches >>>>> running >>>>> Xonotic on Ultra in 1920 x 1080. >>>>> >>>>> Without any patches I get around ~47.0fps on average with my dedicated >>>>> HD7870. >>>>> >>>>> Adding only "drm/radeon: Use write-combined CPU mappings of rings and >>>>> IBs on >= SI" and that goes down to ~45.3fps. >>>>> >>>>> Adding on to off that "drm/radeon: Use VRAM for indirect buffers on >= >>>>> SI" and the frame rate goes down to ~27.74fps. >>>> Hmm, looks like I'll need to do more benchmarking of 3D workloads as >>>> well. >> I haven't been able to consistently[0] measure any significant >> difference between all placements of the rings and IBs with Xonotic or >> Reaction Quake with my Bonaire. I'd expect Xonotic to be shader / GPU >> memory bandwidth bound rather than CS bound anyway, so a ~40% hit from >> that kernel patch alone is very surprising. Are you sure it wasn't just >> the same kind of variation as described below? > > Yes, I've measured that multiple times and the results where quite > consistent. > > But I didn't measured it on a Bonaire, where the bottleneck probably > isn't the CPU load. I measured it on a fast Pitcairn Ahem, my Bonaire is cranking out ~90fps of Xonotic Ultra at 1920x1080. :) (And AFAIK there are even faster Bonaire variants) > and there Xonotic was clearly affected by the patches. Okay, I hadn't realized we're not doing any command stream checking as of CIK, that probably explains the difference. >>> My tests clearly show that we still can use USWC for the ring buffer on >>> SI and probably earlier chips as well. >> Yeah, that might be the safest approach for now. > How about using USWC for the rings on all chips since R600 Any particular reason against doing it for older chips which support unsnooped access as well? > and for the IB only on CIK? As far as I can see that should do the trick > quite well. Yeah, sounds good. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel