> -----Original Message----- > From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx] > Sent: Friday, March 18, 2016 5:33 AM > To: devel@xxxxxxxxxxxxxxxxxxxxxx > Cc: linux-kernel@xxxxxxxxxxxxxxx; KY Srinivasan <kys@xxxxxxxxxxxxx>; > Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>; Alex Ng (LIS) > <alexng@xxxxxxxxxxxxx>; Radim Krcmar <rkrcmar@xxxxxxxxxx>; Cathy > Avery <cavery@xxxxxxxxxx> > Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios > > Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always > delivered to CPU0 regardless of what CPU we're sending > CHANNELMSG_UNLOAD > from. vmbus_wait_for_unload() doesn't account for the fact that in case > we're crashing on some other CPU and CPU0 is still alive and operational > CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing > vmbus_connection.unload_event, our wait on the current CPU will never > end. What was the host you were testing on? K. Y > > Do the following: > 1) Check for completion_done() in the loop. In case interrupt handler is > still alive we'll get the confirmation we need. > > 2) Always read CPU0's message page as CHANNELMSG_UNLOAD_RESPONSE > will be > delivered there. We can race with still-alive interrupt handler doing > the same but we don't care as we're checking completion_done() now. > > 3) Cleanup message pages on all CPUs. This is required (at least for the > current CPU as we're clearing CPU0 messages now but we may want to > bring > up additional CPUs on crash) as new messages won't be delivered till we > consume what's pending. On boot we'll place message pages somewhere > else > and we won't be able to read stale messages. > > Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> > --- > drivers/hv/channel_mgmt.c | 30 +++++++++++++++++++++++++----- > 1 file changed, 25 insertions(+), 5 deletions(-) > > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c > index b10e8f74..5f37057 100644 > --- a/drivers/hv/channel_mgmt.c > +++ b/drivers/hv/channel_mgmt.c > @@ -512,14 +512,26 @@ static void init_vp_index(struct vmbus_channel > *channel, const uuid_le *type_gui > > static void vmbus_wait_for_unload(void) > { > - int cpu = smp_processor_id(); > - void *page_addr = hv_context.synic_message_page[cpu]; > + int cpu; > + void *page_addr = hv_context.synic_message_page[0]; > struct hv_message *msg = (struct hv_message *)page_addr + > VMBUS_MESSAGE_SINT; > struct vmbus_channel_message_header *hdr; > bool unloaded = false; > > - while (1) { > + /* > + * CHANNELMSG_UNLOAD_RESPONSE is always delivered to CPU0. > When we're > + * crashing on a different CPU let's hope that IRQ handler on CPU0 is > + * still functional and vmbus_unload_response() will complete > + * vmbus_connection.unload_event. If not, the last thing we can do > is > + * read message page for CPU0 regardless of what CPU we're on. > + */ > + while (!unloaded) { > + if (completion_done(&vmbus_connection.unload_event)) { > + unloaded = true; > + break; > + } > + > if (READ_ONCE(msg->header.message_type) == > HVMSG_NONE) { > mdelay(10); > continue; > @@ -530,9 +542,17 @@ static void vmbus_wait_for_unload(void) > unloaded = true; > > vmbus_signal_eom(msg); > + } > > - if (unloaded) > - break; > + /* > + * We're crashing and already got the UNLOAD_RESPONSE, cleanup > all > + * maybe-pending messages on all CPUs to be able to receive new > + * messages after we reconnect. > + */ > + for_each_online_cpu(cpu) { > + page_addr = hv_context.synic_message_page[cpu]; > + msg = (struct hv_message *)page_addr + > VMBUS_MESSAGE_SINT; > + msg->header.message_type = HVMSG_NONE; > } > } > > -- > 2.5.0 _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel