On Tue, Sep 17, 2024 at 4:37 PM Konrad Dybcio <konradybcio@xxxxxxxxxx> wrote: > > On 17.09.2024 5:30 PM, Rob Clark wrote: > > On Tue, Sep 17, 2024 at 6:47 AM Konrad Dybcio <konradybcio@xxxxxxxxxx> wrote: > >> > >> On 13.09.2024 9:51 PM, Rob Clark wrote: > >>> From: Rob Clark <robdclark@xxxxxxxxxxxx> > >>> > >>> The CP_SMMU_TABLE_UPDATE _should_ be waiting for idle, but on some > >>> devices (x1-85, possibly others), it seems to pass that barrier while > >>> there are still things in the event completion FIFO waiting to be > >>> written back to memory. > >> > >> Can we try to force-fault around here on other GPUs and perhaps > >> limit this workaround? > > > > not sure what you mean by "force-fault"... > > I suppose 'reproduce' is what I meant I haven't _noticed_ it yet.. if you want to try on devices you have, glmark2 seems to be good at reproducing.. I think the reason is combo of high fps (on x1-85 most scenes are north of 8k fps) so you get a lot of context switches btwn compositor and glmark2. Most scenes are just a clear plus single draw, and I guess the compositor is just doing a single draw/blit. A6xx can be two draws/blits deep in it's pipeline, a7xx can be four, which maybe exacerbates this. > > we could probably limit > > this to certain GPUs, the only reason I didn't is (a) it should be > > harmless when it is not needed, > > Do we have any realistic perf hits here? I don't think so, we can't switch ttbr0 while the gpu is still busy so what the sqe does for CP_SMMU_TABLE_UPDATE _should_ be equivalent. Maybe it amounts to some extra CP cycles and memory read, but I think that should be negligible given that the expensive thing is that we are stalling the gpu until it is idle. > > and (b) I have no real good way to get > > an exhaustive list of where it is needed. Maybe/hopefully it is only > > x1-85, but idk. > > > > It does bring up an interesting question about preemption, though > > Yeah.. The KMD does setup an xAMBLE to clear the perfcntrs on context switch. We could maybe piggy back on that, but I guess we'd have to patch in the fence value to wait for? > Do we know what windows does here? not sure, maybe akhil has some way to check. Whether a similar scenario comes up with windows probably depends on how the winsys works. If it dropped frames when rendering >vblank rate, you'd get fewer context switches. BR, -R > Konrad