[AMD Official Use Only - General] Hi Michel, I reproduced the glxgears 400fps scenario locally. The issue is caused by the patch5 "drm/amdgpu: Improve the software rings priority scheduler" which slows down the low priority scheduler thread if high priority ib is under executing. I'll drop this patch as we cannot identify gpu bound according to the unsignaled fence, etc. Thanks, Jiadong -----Original Message----- From: Michel Dänzer <michel@xxxxxxxxxxx> Sent: Thursday, November 3, 2022 5:05 PM To: Zhu, Jiadong <Jiadong.Zhu@xxxxxxx> Cc: Tuikov, Luben <Luben.Tuikov@xxxxxxx>; Huang, Ray <Ray.Huang@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8) On 2022-11-03 03:58, Zhu, Jiadong wrote: > [AMD Official Use Only - General] > >> The bad news is that this series still makes some things very slow. The most extreme examples so far are glxgears (runs at ~400 fps now, ~7000 fps before, i.e. almost 20x slowdown) and hexchat (scrolling one page now takes ~1 second, I can see it drawing line by line; before it was almost instantaneous). I suspect this series makes the overhead of running a single GPU job much bigger. On the bright side, I'm not noticing any significant intermittent freezes anymore. > > Hi Michel, > > Thanks for the trying. > Is there high priority jobs running while executing glxgears? Yes, mutter is submitting high priority jobs. However, I don't think that can explain the problem by itself: mutter only draws once per display refresh cycle. Let's assume mutter's GPU work takes ~6-7ms (conservative example, should be less than that usually). That leaves ~10ms per display refresh cycle (at 60 Hz refresh rate) where GPU work from glxgears & Xwayland can run without getting preempted. Since glxgears runs at ~7000 fps without this series, it should be able to draw at least ~70 frames in 10ms[0], which corresponds to over 4000 fps. Yet it manages only 1/10 of that. [0] Worst case consideration, ignoring the fact that without this series, glxgears runs at ~7000 fps while mutter sustains 60 fps. > I am running glxgears while submitting high priority ibs using amdgpu_test, the fps ranges from 6000~8000. It's getting clear that artificial tests such as amdgpu_test don't suffice for evaluating the real-world impact of this kind of change. > Continuous preemption and resubmission may cause the slow fps. Could you have a check about how fast the trailing fence seqNo expands. On my side, the increment of Last signaled trailing fence is < 10 in a second. I had to go back to a kernel without this series, as it was just unusable. As this is my main machine, I don't know when I'll get a chance to check this. -- Earthling Michel Dänzer | https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2F&data=05%7C01%7CJiadong.Zhu%40amd.com%7C5cb642e1abf34ab7377308dabd7adfb7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638030632689527329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jJuSMxqY4nMltWdrSOe4iJF5kmwPG2gBFXudDmheNBc%3D&reserved=0 Libre software enthusiast | Mesa and Xwayland developer