On 22.11.2023 10:29, Krzysztof Kozlowski wrote: > On 22/11/2023 10:06, AngeloGioacchino Del Regno wrote: >>>>> Hey Krzysztof, >>>>> >>>>> This is interesting. It might be about the cores that are missing from the partial >>>>> core_mask raising interrupts, but an external abort on non-linefetch is strange to >>>>> see here. >>>> I've seen such external aborts in the past, and the fault type has >>>> often been misleading. It's unlikely to have anything to do with a >>> Yeah, often accessing device with power or clocks gated. >>> >> Except my commit does *not* gate SoC power, nor SoC clocks 🙂 > It could be that something (like clocks or power supplies) was missing > on this board/SoC, which was not critical till your patch came. > >> What the "Really power off ..." commit does is to ask the GPU to internally power >> off the shaders, tilers and L2, that's why I say that it is strange to see that >> kind of abort. >> >> The GPU_INT_CLEAR GPU_INT_STAT, GPU_FAULT_STATUS and GPU_FAULT_ADDRESS_{HI/LO} >> registers should still be accessible even with shaders, tilers and cache OFF. >> >> Anyway, yes, synchronizing IRQs before calling the poweroff sequence would also >> work, but that'd add up quite a bit of latency on the runtime_suspend() call, so >> in this case I'd be more for avoiding to execute any register r/w in the handler >> by either checking if the GPU is supposed to be OFF, or clearing interrupts, which >> may not work if those are generated after the execution of the poweroff function. >> Or we could simply disable the irq after power_off, but that'd be hacky (as well). >> >> >> Let's see if asking to poweroff *everything* works: > Worked. Yes, I also got into this issue some time ago, but I didn't report it because I also had some power supply related problems on my test farm and everything was a bit unstable. I wasn't 100% sure that the $subject patch is responsible for the observed issues. Now, after fixing power supply, I confirm that the issue was revealed by the $subject patch and above mentioned change fixes the problem. Feel free to add: Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland