This is a note to let you know that I've just added the patch titled Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs to the 5.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: drivers-hv-vmbus-fix-vmbus_wait_for_unload-to-scan-present-cpus.patch and it can be found in the queue-5.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From 320805ab61e5f1e2a5729ae266e16bec2904050c Mon Sep 17 00:00:00 2001 From: Michael Kelley <mikelley@xxxxxxxxxxxxx> Date: Thu, 18 May 2023 08:13:52 -0700 Subject: Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs From: Michael Kelley <mikelley@xxxxxxxxxxxxx> commit 320805ab61e5f1e2a5729ae266e16bec2904050c upstream. vmbus_wait_for_unload() may be called in the panic path after other CPUs are stopped. vmbus_wait_for_unload() currently loops through online CPUs looking for the UNLOAD response message. But the values of CONFIG_KEXEC_CORE and crash_kexec_post_notifiers affect the path used to stop the other CPUs, and in one of the paths the stopped CPUs are removed from cpu_online_mask. This removal happens in both x86/x64 and arm64 architectures. In such a case, vmbus_wait_for_unload() only checks the panic'ing CPU, and misses the UNLOAD response message except when the panic'ing CPU is CPU 0. vmbus_wait_for_unload() eventually times out, but only after waiting 100 seconds. Fix this by looping through *present* CPUs in vmbus_wait_for_unload(). The cpu_present_mask is not modified by stopping the other CPUs in the panic path, nor should it be. Also, in a CoCo VM the synic_message_page is not allocated in hv_synic_alloc(), but is set and cleared in hv_synic_enable_regs() and hv_synic_disable_regs() such that it is set only when the CPU is online. If not all present CPUs are online when vmbus_wait_for_unload() is called, the synic_message_page might be NULL. Add a check for this. Fixes: cd95aad55793 ("Drivers: hv: vmbus: handle various crash scenarios") Cc: stable@xxxxxxxxxxxxxxx Reported-by: John Starks <jostarks@xxxxxxxxxxxxx> Signed-off-by: Michael Kelley <mikelley@xxxxxxxxxxxxx> Reviewed-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> Link: https://lore.kernel.org/r/1684422832-38476-1-git-send-email-mikelley@xxxxxxxxxxxxx Signed-off-by: Wei Liu <wei.liu@xxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- drivers/hv/channel_mgmt.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) --- a/drivers/hv/channel_mgmt.c +++ b/drivers/hv/channel_mgmt.c @@ -827,11 +827,22 @@ static void vmbus_wait_for_unload(void) if (completion_done(&vmbus_connection.unload_event)) goto completed; - for_each_online_cpu(cpu) { + for_each_present_cpu(cpu) { struct hv_per_cpu_context *hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu); + /* + * In a CoCo VM the synic_message_page is not allocated + * in hv_synic_alloc(). Instead it is set/cleared in + * hv_synic_enable_regs() and hv_synic_disable_regs() + * such that it is set only when the CPU is online. If + * not all present CPUs are online, the message page + * might be NULL, so skip such CPUs. + */ page_addr = hv_cpu->synic_message_page; + if (!page_addr) + continue; + msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT; @@ -865,11 +876,14 @@ completed: * maybe-pending messages on all CPUs to be able to receive new * messages after we reconnect. */ - for_each_online_cpu(cpu) { + for_each_present_cpu(cpu) { struct hv_per_cpu_context *hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu); page_addr = hv_cpu->synic_message_page; + if (!page_addr) + continue; + msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT; msg->header.message_type = HVMSG_NONE; } Patches currently in stable-queue which might be from mikelley@xxxxxxxxxxxxx are queue-5.15/drivers-hv-vmbus-fix-vmbus_wait_for_unload-to-scan-present-cpus.patch queue-5.15/pci-hv-add-a-per-bus-mutex-state_lock.patch queue-5.15/pci-hv-fix-a-race-condition-in-hv_irq_unmask-that-can-cause-panic.patch queue-5.15/revert-pci-hv-fix-a-timing-issue-which-causes-kdump-to-fail-occasionally.patch queue-5.15/pci-hv-remove-the-useless-hv_pcichild_state-from-struct-hv_pci_dev.patch queue-5.15/drivers-hv-vmbus-call-hv_synic_free-if-hv_synic_alloc-fails.patch queue-5.15/pci-hv-fix-a-race-condition-bug-in-hv_pci_query_relations.patch