On 8/13/2024 3:07 PM, Greg KH wrote: > On Mon, Aug 05, 2024 at 04:36:28PM +0530, Abhishek Singh wrote: >> >> On 7/30/2024 12:46 PM, Greg KH wrote: >>> On Tue, Jul 30, 2024 at 12:39:45PM +0530, Abhishek Singh wrote: >>>> The user process on ARM closes the device node while closing the >>>> session, triggers a remote call to terminate the PD running on the >>>> DSP. If the DSP is in an unstable state and cannot process the remote >>>> request from the HLOS, glink fails to deliver the kill request to the >>>> DSP, resulting in a timeout error. Currently, this error is ignored, >>>> and the session is closed, causing all the SMMU mappings associated >>>> with that specific PD to be removed. However, since the PD is still >>>> operational on the DSP, any attempt to access these SMMU mappings >>>> results in an SMMU fault, leading to a panic. As the SMMU mappings >>>> have already been removed, there is no available information on the >>>> DSP to determine the root cause of its unresponsiveness to remote >>>> calls. As the DSP is unresponsive to all process remote calls, use >>>> BUG_ON to prevent the removal of SMMU mappings and to properly >>>> identify the root cause of the DSP’s unresponsiveness to the remote >>>> calls. >>>> >>>> Signed-off-by: Abhishek Singh <quic_abhishes@xxxxxxxxxxx> >>>> --- >>>> drivers/misc/fastrpc.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c >>>> index 5204fda51da3..bac9c749564c 100644 >>>> --- a/drivers/misc/fastrpc.c >>>> +++ b/drivers/misc/fastrpc.c >>>> @@ -97,6 +97,7 @@ >>>> #define FASTRPC_RMID_INIT_CREATE_STATIC 8 >>>> #define FASTRPC_RMID_INIT_MEM_MAP 10 >>>> #define FASTRPC_RMID_INIT_MEM_UNMAP 11 >>>> +#define PROCESS_KILL_SC 0x01010000 >>>> >>>> /* Protection Domain(PD) ids */ >>>> #define ROOT_PD (0) >>>> @@ -1128,6 +1129,9 @@ static int fastrpc_invoke_send(struct fastrpc_session_ctx *sctx, >>>> fastrpc_context_get(ctx); >>>> >>>> ret = rpmsg_send(cctx->rpdev->ept, (void *)msg, sizeof(*msg)); >>>> + /* trigger panic if glink communication is broken and the message is for PD kill */ >>>> + BUG_ON((ret == -ETIMEDOUT) && (handle == FASTRPC_INIT_HANDLE) && >>>> + (ctx->sc == PROCESS_KILL_SC)); >>> >>> You just crashed the machine completely, sorry, but no, properly handle >>> the issue and clean up if you can detect it, do not break systems. >>> >> But the Glink communication with DSP is already broken; we cannot communicate with the DSP. >> The system will crash if we proceed with cleanup on the ARM side. If we don’t do cleanup, >> a resource leak will occur. Eventually, the system will become dead. That’s why I am >> crashing the device. > > Then explicitly call panic() if you think you really want to shut the > system down. > >> What does it mean to explicitly call panic()? Are you trying to say we should use panic() instead of BUG_ON()? > > greg k-h