On Thu, Jul 21, 2022 at 9:04 AM Akhil P Oommen <quic_akhilpo@xxxxxxxxxxx> wrote: > > On 7/20/2022 11:34 AM, Akhil P Oommen wrote: > > On 7/19/2022 3:26 PM, Rajendra Nayak wrote: > >> > >> > >> On 7/19/2022 12:49 PM, Stephen Boyd wrote: > >>> Quoting Akhil P Oommen (2022-07-18 23:37:16) > >>>> On 7/19/2022 11:19 AM, Stephen Boyd wrote: > >>>>> Quoting Akhil P Oommen (2022-07-18 21:07:05) > >>>>>> On 7/14/2022 11:10 AM, Akhil P Oommen wrote: > >>>>>>> IIUC, qcom gdsc driver doesn't ensure hardware is collapsed > >>>>>>> since they > >>>>>>> are vote-able switches. Ideally, we should ensure that the hw has > >>>>>>> collapsed for gpu recovery because there could be transient > >>>>>>> votes from > >>>>>>> other subsystems like hypervisor using their vote register. > >>>>>>> > >>>>>>> I am not sure how complex the plumbing to gpucc driver would be > >>>>>>> to allow > >>>>>>> gpu driver to check hw status. OTOH, with this patch, gpu driver > >>>>>>> does a > >>>>>>> read operation on a gpucc register which is in always-on domain. > >>>>>>> That > >>>>>>> means we don't need to vote any resource to access this register. > >>> > >>> Reading between the lines here, you're saying that you have to read the > >>> gdsc register to make sure that the gdsc is in some state? Can you > >>> clarify exactly what you're doing? And how do you know that something > >>> else in the kernel can't cause the register to change after it is read? > >>> It certainly seems like we can't be certain because there is voting > >>> involved. > > From gpu driver, cx_gdscr.bit[31] (power off status) register can be > > polled to ensure that it *collapsed at least once*. We don't need to > > care if something turns ON gdsc after that. > > > >> > >> yes, this looks like the best case effort to get the gpu to recover, but > >> the kernel driver really has no control to make sure this condition can > >> always be met (because it depends on other entities like hyp, > >> trustzone etc right?) > >> Why not just put a worst case polling delay? > > > > I didn't get you entirely. Where do you mean to keep the polling delay? > >> > >>> > >>>>>>> > >>>>>>> Stephen/Rajendra/Taniya, any suggestion? > >>>>> Why can't you assert a gpu reset signal with the reset APIs? This > >>>>> series > >>>>> seems to jump through a bunch of hoops to get the gdsc and power > >>>>> domain > >>>>> to "reset" when I don't know why any of that is necessary. Can't we > >>>>> simply assert a reset to the hardware after recovery completes so the > >>>>> device is back into a good known POR (power on reset) state? > >>>> That is because there is no register interface to reset GPU CX domain. > >>>> The recommended sequence from HW design folks is to collapse both > >>>> cx and > >>>> gx gdsc to properly reset gpu/gmu. > >>>> > >>> > >>> Ok. One knee jerk reaction is to treat the gdsc as a reset then and > >>> possibly mux that request along with any power domain on/off so that if > >>> the reset is requested and the power domain is off nothing happens. > >>> Otherwise if the power domain is on then it manually sequences and > >>> controls the two gdscs so that the GPU is reset and then restores the > >>> enable state of the power domain. > > It would be fatal to asynchronously pull the plug on CX gdsc > > forcefully because there might be another gpu/smmu driver thread > > accessing registers in cx domain. > > > > -Akhil. > > > But, we can move the cx collapse polling to gpucc and expose it to gpu > driver using 'reset' framework. I am not very familiar with clk driver, > but I did a rough prototype here (untested): > https://zerobin.net/?d34b5f958be3b9b8#NKGzdPy9fgcuOqXZ/XqjI7b8JWcivqe+oSTf4yWHSOU= > > If this approach is acceptable, I will send it out as a separate series. > I'm not super familiar w/ reset framework, but this approach seems like it would avoid needing to play games with working around runpm as well. So that seems like a cleaner approach. BR, -R