Ben reported an issue that the patch [1] breaks the suspend/resume. After digging for a while, I noticed that this problem had been there before introducing that patch, but not exposed because r535_gsp_rpc_push() doesn't repsect the caller's requirement when handling the large RPC command: It won't wait for the reply even the caller requires. (Small RPCs are fine.) After that patch series is introduced, r535_gsp_rpc_push() really waits for the reply and receives the entire GSP message, which is required by the large vGPU RPC command. There are currently two GSP RPC message handling policy: - a. dont care. discard the message before returning to the caller. - b. receive the entire message. wait and receive the entire message before returning to the caller. On the path of suspend/resume, there is a large GSP command NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY, which returns only a GSP RPC message header to tell the driver that the request is handled. The policy in the driver is to receive the entrie message, which ends up with a timeout and error when r535_gsp_rpc_push() tries to receive the message. That breaks the suspend/resume path. This series factors out the current GSP RPC message handling policy and introduces a new policy for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY and a kernel doc to illustrate the policies. With this patchset, the problem can't be reproduced and suspend/resume works on my L40. [1] https://lore.kernel.org/nouveau/7eb31f1f-fc3a-4fb5-86cf-4bd011d68ff1@xxxxxxxxxx/T/#t Zhi Wang (5): drm/nouveau/nvkm: factor out r535_gsp_rpc_handle_reply() drm/nouveau/nvkm: factor out the current RPC command reply policies drm/nouveau/nvkm: introduce new GSP reply policy NVKM_GSP_RPC_REPLY_POLL drm/nouveau/nvkm: use the new policy for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY drm/nouveau/nvkm: introduce a kernel doc for GSP message handling Documentation/gpu/nouveau.rst | 3 + .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h | 34 ++++++-- .../gpu/drm/nouveau/nvkm/subdev/bar/r535.c | 2 +- .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 80 +++++++++++-------- .../drm/nouveau/nvkm/subdev/instmem/r535.c | 2 +- 5 files changed, 78 insertions(+), 43 deletions(-) -- 2.43.5