NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY is a large GSP RPC command. The actual required policy is NVKM_GSP_RPC_REPLY_POLL. This can be observed from the dump of the GSP message queue. After the large GSP RPC command is issued, GSP will write only an empty RPC header in the queue as the reply. Without this change, the policy "receiving the entire message" is used for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY. This causes the timeout of receiving the returned GSP message in the suspend/resume path: [ 80.683646] r535_gsp_rpc_push() - 962: rpc->function 4 gsp_rpc_len 0 payload_size 2c630 max_payload_size ffb0 [ 80.704222] r535_gsp_msg_recv() - 501: recv rpc->fn 4, rpc->length 20 [ 81.014566] mlx5_core 0000:01:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) [ 83.384132] mlx5_core 0000:01:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) [ 103.784986] ------------[ cut here ]------------ [ 103.789620] WARNING: CPU: 6 PID: 246 at drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:201 r535_gsp_msgq_wait+0x8c/0xa0 [nouveau] [ 103.801441] libcxgbi(E) libcxgb(E) qla4xxx(E) iscsi_boot_sysfs(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) fuse(E) [ 103.903122] CPU: 6 UID: 0 PID: 246 Comm: kworker/u130:30 Tainted: G E 6.14.0-rc1+ #1 [ 103.912254] Tainted: [E]=UNSIGNED_MODULE [ 103.916193] Hardware name: ASRockRack 1U1G-MILAN/N/ROMED8-NL, BIOS L3.12E 09/06/2022 [ 103.923940] Workqueue: async async_run_entry_fn [ 103.928486] RIP: 0010:r535_gsp_msgq_wait+0x8c/0xa0 [nouveau] [ 103.934372] Code: 00 00 49 8b 94 24 e8 08 00 00 8b 12 29 ea 01 d0 0f 43 c2 39 d8 72 c8 41 8b 55 00 85 d2 74 0b 5b 5d 41 5c 41 5d e9 cf 0c 43 e6 <0f> 0b b8 92 ff ff ff 5b 5d 41 5c 41 5d e9 bd 0c 43 e6 66 90 90 90 [ 103.953140] RSP: 0018:ffffb81a40baf970 EFLAGS: 00010246 [ 103.958381] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000009651f0 [ 103.965524] RDX: 0000000000000000 RSI: 0000000055555554 RDI: ffffb81a40baf8c0 [ 103.972663] RBP: 000000000000001a R08: 0000000000000001 R09: 0000000000000000 [ 103.979805] R10: 0000000000000000 R11: ffff97f70e33424c R12: ffff97d848d40000 [ 103.986948] R13: ffffb81a40baf9fc R14: ffff97d848d40000 R15: 0000000000000000 [ 103.994090] FS: 0000000000000000(0000) GS:ffff97f70e300000(0000) knlGS:0000000000000000 [ 104.002187] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 104.007947] CR2: 00000a8bdc3b8000 CR3: 000000016e738001 CR4: 0000000000770ef0 [ 104.015087] PKRU: 55555554 [ 104.017809] Call Trace: [ 104.020268] <TASK> [ 104.022382] ? __warn+0x84/0x130 [ 104.025627] ? r535_gsp_msgq_wait+0x8c/0xa0 [nouveau] [ 104.030889] ? report_bug+0x18a/0x1a0 [ 104.034561] ? handle_bug+0x53/0x90 [ 104.038061] ? exc_invalid_op+0x14/0x70 [ 104.041910] ? asm_exc_invalid_op+0x16/0x20 [ 104.046109] ? r535_gsp_msgq_wait+0x8c/0xa0 [nouveau] [ 104.051351] r535_gsp_msgq_recv+0x13c/0x1e0 [nouveau] [ 104.056588] r535_gsp_msg_recv+0xa9/0x260 [nouveau] [ 104.061654] r535_gsp_rpc_push+0x12c/0x1b0 [nouveau] [ 104.066805] fbsr_memlist+0x13a/0x1c0 [nouveau] [ 104.071564] r535_instmem_suspend+0x3e4/0x720 [nouveau] [ 104.076997] ? srso_alias_return_thunk+0x5/0xfbef5 [ 104.081807] ? prb_read+0x6f/0x150 [ 104.085225] ? nvkm_instmem_fini+0x25/0x60 [nouveau] [ 104.090383] nvkm_instmem_fini+0x25/0x60 [nouveau] [ 104.095371] nvkm_subdev_fini+0x66/0x150 [nouveau] [ 104.100353] ? down_write+0xe/0x60 [ 104.103765] nvkm_device_fini+0x94/0x1e0 [nouveau] [ 104.108808] nvkm_udevice_fini+0x4f/0x70 [nouveau] [ 104.113831] nvkm_object_fini+0xb8/0x240 [nouveau] [ 104.118814] nvkm_object_fini+0x6e/0x240 [nouveau] [ 104.123788] nouveau_do_suspend+0xf9/0x210 [nouveau] [ 104.128997] nouveau_pmops_suspend+0x39/0x80 [nouveau] Use the new policy NVKM_GSP_RPC_REPLY_POLL on the GSP RPC command NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY. Cc: Ben Skeggs <bskeggs@xxxxxxxxxx> Signed-off-by: Zhi Wang <zhiw@xxxxxxxxxx> --- drivers/gpu/drm/nouveau/nvkm/subdev/instmem/r535.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/r535.c index 2789efe9c100..35ba1798ee6e 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/r535.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/r535.c @@ -105,7 +105,7 @@ fbsr_memlist(struct nvkm_gsp_device *device, u32 handle, enum nvkm_memory_target rpc->pteDesc.pte_pde[i].pte = (phys >> 12) + i; } - ret = nvkm_gsp_rpc_wr(gsp, rpc, NVKM_GSP_RPC_REPLY_RECV); + ret = nvkm_gsp_rpc_wr(gsp, rpc, NVKM_GSP_RPC_REPLY_POLL); if (ret) return ret; -- 2.43.5