Re: [Freedreno] [PATCH v2 5/7] arm64: dts: qcom: sc7280: Update gpu register list

Akhil P Oommen <quic_akhilpo@xxxxxxxxxxx> · Wed, 20 Jul 2022 11:34:01 +0530

On 7/19/2022 3:26 PM, Rajendra Nayak wrote:

On 7/19/2022 12:49 PM, Stephen Boyd wrote:
Quoting Akhil P Oommen (2022-07-18 23:37:16)
On 7/19/2022 11:19 AM, Stephen Boyd wrote:
Quoting Akhil P Oommen (2022-07-18 21:07:05)
On 7/14/2022 11:10 AM, Akhil P Oommen wrote:
IIUC, qcom gdsc driver doesn't ensure hardware is collapsed since 
they
are vote-able switches. Ideally, we should ensure that the hw has
collapsed for gpu recovery because there could be transient votes 
from
other subsystems like hypervisor using their vote register.

I am not sure how complex the plumbing to gpucc driver would be 
to allow
gpu driver to check hw status. OTOH, with this patch, gpu driver 
does a
read operation on a gpucc register which is in always-on domain. 
That
means we don't need to vote any resource to access this register.

Reading between the lines here, you're saying that you have to read the
gdsc register to make sure that the gdsc is in some state? Can you
clarify exactly what you're doing? And how do you know that something
else in the kernel can't cause the register to change after it is read?
It certainly seems like we can't be certain because there is voting
involved.
From gpu driver, cx_gdscr.bit[31] (power off status) register can be 
polled to ensure that it *collapsed at least once*. We don't need to 
care if something turns ON gdsc after that.

yes, this looks like the best case effort to get the gpu to recover, but
the kernel driver really has no control to make sure this condition can
always be met (because it depends on other entities like hyp, 
trustzone etc right?)
Why not just put a worst case polling delay?

I didn't get you entirely. Where do you mean to keep the polling delay?

Stephen/Rajendra/Taniya, any suggestion?
Why can't you assert a gpu reset signal with the reset APIs? This 
series
seems to jump through a bunch of hoops to get the gdsc and power 
domain
to "reset" when I don't know why any of that is necessary. Can't we
simply assert a reset to the hardware after recovery completes so the
device is back into a good known POR (power on reset) state?
That is because there is no register interface to reset GPU CX domain.
The recommended sequence from HW design folks is to collapse both cx 
and
gx gdsc to properly reset gpu/gmu.

Ok. One knee jerk reaction is to treat the gdsc as a reset then and
possibly mux that request along with any power domain on/off so that if
the reset is requested and the power domain is off nothing happens.
Otherwise if the power domain is on then it manually sequences and
controls the two gdscs so that the GPU is reset and then restores the
enable state of the power domain.
It would be fatal to asynchronously pull the plug on CX gdsc forcefully 
because there might be another gpu/smmu driver thread accessing 
registers in cx domain.

-Akhil.