Michael Kelley <mikelley@xxxxxxxxxxxxx> writes: > vmbus_wait_for_unload() may be called in the panic path after other > CPUs are stopped. vmbus_wait_for_unload() currently loops through > online CPUs looking for the UNLOAD response message. But the values of > CONFIG_KEXEC_CORE and crash_kexec_post_notifiers affect the path used > to stop the other CPUs, and in one of the paths the stopped CPUs > are removed from cpu_online_mask. This removal happens in both > x86/x64 and arm64 architectures. In such a case, vmbus_wait_for_unload() > only checks the panic'ing CPU, and misses the UNLOAD response message > except when the panic'ing CPU is CPU 0. vmbus_wait_for_unload() > eventually times out, but only after waiting 100 seconds. > > Fix this by looping through *present* CPUs in vmbus_wait_for_unload(). > The cpu_present_mask is not modified by stopping the other CPUs in the > panic path, nor should it be. Furthermore, the synic_message_page > being checked in vmbus_wait_for_unload() is allocated in > hv_synic_alloc() for all present CPUs. So looping through the > present CPUs is more consistent. > > For additional safety, also add a check for the message_page being > NULL before looking for the UNLOAD response message. > > Reported-by: John Starks <jostarks@xxxxxxxxxxxxx> > Fixes: cd95aad55793 ("Drivers: hv: vmbus: handle various crash scenarios") I see you Cc:ed stable@ on the patch, should we also add Cc: stable@xxxxxxxxxxxxxxx here explicitly so it gets picked up by various stable backporting scritps? I guess Wei can do it when picking the patch to the queue... > Signed-off-by: Michael Kelley <mikelley@xxxxxxxxxxxxx> > --- > drivers/hv/channel_mgmt.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c > index 007f26d..df2ba20 100644 > --- a/drivers/hv/channel_mgmt.c > +++ b/drivers/hv/channel_mgmt.c > @@ -829,11 +829,14 @@ static void vmbus_wait_for_unload(void) > if (completion_done(&vmbus_connection.unload_event)) > goto completed; > > - for_each_online_cpu(cpu) { > + for_each_present_cpu(cpu) { > struct hv_per_cpu_context *hv_cpu > = per_cpu_ptr(hv_context.cpu_context, cpu); > > page_addr = hv_cpu->synic_message_page; > + if (!page_addr) > + continue; > + In theory, synic_message_page for all present CPUs is permanently assigned in hv_synic_alloc() and we fail the whole thing if any of these allocations fail so page_addr == NULL is likely impossible today but there's certainly no harm in having this extra check here, this is not a hotpath. > msg = (struct hv_message *)page_addr > + VMBUS_MESSAGE_SINT; > > @@ -867,11 +870,14 @@ static void vmbus_wait_for_unload(void) > * maybe-pending messages on all CPUs to be able to receive new > * messages after we reconnect. > */ > - for_each_online_cpu(cpu) { > + for_each_present_cpu(cpu) { > struct hv_per_cpu_context *hv_cpu > = per_cpu_ptr(hv_context.cpu_context, cpu); > > page_addr = hv_cpu->synic_message_page; > + if (!page_addr) > + continue; > + > msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT; > msg->header.message_type = HVMSG_NONE; > } Reviewed-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> -- Vitaly