[AMD Official Use Only - General] Hi Michel, It is true that we don’t get obvious improvement on performance with these patches. The original requirement of using mcbp is that when there is a very long ib package with many draw cmds on low priority which uses up gpu utilization, we give a chance to high priority ibs executed by gpu. The total performance could be dropped as mcbp drains the pipe and the low priority ibs would be resubmitted again after that. This set of patches is mainly to implement priority queues by software rings. We may use other method instead of mcbp to improve it later. Thanks, Jiadong -----Original Message----- From: Alex Deucher <alexdeucher@xxxxxxxxx> Sent: Friday, November 11, 2022 1:54 AM To: Michel Dänzer <michel@xxxxxxxxxxx> Cc: Zhu, Jiadong <Jiadong.Zhu@xxxxxxx>; Tuikov, Luben <Luben.Tuikov@xxxxxxx>; Huang, Ray <Ray.Huang@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8) On Thu, Nov 10, 2022 at 12:00 PM Michel Dänzer <michel@xxxxxxxxxxx> wrote: > > On 2022-11-08 09:01, Zhu, Jiadong wrote:> From: Michel Dänzer > <michel@xxxxxxxxxxx> > > > >>>> The bad news is that this series still makes some things very slow. The most extreme examples so far are glxgears (runs at ~400 fps now, ~7000 fps before, i.e. almost 20x slowdown) and hexchat (scrolling one page now takes ~1 second, I can see it drawing line by line; before it was almost instantaneous). I suspect this series makes the overhead of running a single GPU job much bigger. On the bright side, I'm not noticing any significant intermittent freezes anymore. > >>> > >>> Hi Michel, > >>> > >>> Thanks for the trying. > >>> Is there high priority jobs running while executing glxgears? > >> > >> Yes, mutter is submitting high priority jobs. However, I don't think that can explain the problem by itself: > >> > >> mutter only draws once per display refresh cycle. Let's assume mutter's GPU work takes ~6-7ms (conservative example, should be less than that usually). That leaves ~10ms per display refresh cycle (at 60 Hz refresh rate) where GPU work from glxgears & Xwayland can run without getting preempted. Since glxgears runs at ~7000 fps without this series, it should be able to draw at least ~70 frames in 10ms[0], which corresponds to over 4000 fps. Yet it manages only 1/10 of that. > >> > >> [0] Worst case consideration, ignoring the fact that without this series, glxgears runs at ~7000 fps while mutter sustains 60 fps. > > > > I reproduced the glxgears 400fps scenario locally. The issue is caused by the patch5 "drm/amdgpu: Improve the software rings priority scheduler" which slows down the low priority scheduler thread if high priority ib is under executing. I'll drop this patch as we cannot identify gpu bound according to the unsignaled fence, etc. > > Okay, I'm testing with patches 1-4 only now. > > So far I haven't noticed any negative effects, no slowdowns or intermittent freezes. > > The only issue is that there's hardly any positive effect either. While constantly moving the window of a GPU-limited GpuTest benchmark in circles, most of the time it looks exactly the same as without these patches. Only occasionally, at most every few seconds, I notice that the window movement becomes smoother for an instant. > I think it will largely depend on the workload. The gfx pipe can only be preempted on draw boundaries so if most operations are a single draw, you probably won't see much difference. Alex