On Mon, Jan 6, 2025 at 12:11 PM Akhil P Oommen <quic_akhilpo@xxxxxxxxxxx> wrote: > > On 1/3/2025 1:00 AM, Akhil P Oommen wrote: > > On 1/3/2025 12:02 AM, Rob Clark wrote: > >> From: Rob Clark <robdclark@xxxxxxxxxxxx> > >> > >> On mmu-500, stall-on-fault seems to stall all context banks, causing the > >> GMU to misbehave. So limit this feature to smmu-v2 for now. > >> > >> This fixes an issue with an older mesa bug taking outo the system > >> because of GMU going off into the weeds. > >> > >> What we _think_ is happening is that, if the GPU generates 1000's of > >> faults at ~once (which is something that GPUs can be good at), it can > >> result in a sufficient number of stalled translations preventing other > >> transactions from entering the same TBU. > >> > >> Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx> > > > > Reviewed-by: Akhil P Oommen <quic_akhilpo@xxxxxxxxxxx> > > > > Btw, if stall is not enabled, I think there is no point in capturing > coredump from adreno pagefault handler. By the time we start coredump, > gpu might have switched context. Hmm, we do at least capture ttbr0 both in fault info and from the current submit, so it would at least be possible to tell if you are looking at the wrong context. BR, -R