> -----Original Message----- > From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx] > Sent: Tuesday, March 22, 2016 7:01 AM > To: KY Srinivasan <kys@xxxxxxxxxxxxx> > Cc: devel@xxxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Haiyang > Zhang <haiyangz@xxxxxxxxxxxxx>; Alex Ng (LIS) <alexng@xxxxxxxxxxxxx>; > Radim Krcmar <rkrcmar@xxxxxxxxxx>; Cathy Avery <cavery@xxxxxxxxxx> > Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios > > KY Srinivasan <kys@xxxxxxxxxxxxx> writes: > > >> -----Original Message----- > >> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx] > >> Sent: Monday, March 21, 2016 12:52 AM > >> To: KY Srinivasan <kys@xxxxxxxxxxxxx> > >> Cc: devel@xxxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Haiyang > >> Zhang <haiyangz@xxxxxxxxxxxxx>; Alex Ng (LIS) > <alexng@xxxxxxxxxxxxx>; > >> Radim Krcmar <rkrcmar@xxxxxxxxxx>; Cathy Avery > <cavery@xxxxxxxxxx> > >> Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios > >> > >> KY Srinivasan <kys@xxxxxxxxxxxxx> writes: > >> > >> >> -----Original Message----- > >> >> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx] > >> >> Sent: Friday, March 18, 2016 5:33 AM > >> >> To: devel@xxxxxxxxxxxxxxxxxxxxxx > >> >> Cc: linux-kernel@xxxxxxxxxxxxxxx; KY Srinivasan <kys@xxxxxxxxxxxxx>; > >> >> Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>; Alex Ng (LIS) > >> >> <alexng@xxxxxxxxxxxxx>; Radim Krcmar <rkrcmar@xxxxxxxxxx>; > Cathy > >> >> Avery <cavery@xxxxxxxxxx> > >> >> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios > >> >> > >> >> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is > >> always > >> >> delivered to CPU0 regardless of what CPU we're sending > >> >> CHANNELMSG_UNLOAD > >> >> from. vmbus_wait_for_unload() doesn't account for the fact that in > case > >> >> we're crashing on some other CPU and CPU0 is still alive and > operational > >> >> CHANNELMSG_UNLOAD_RESPONSE will be delivered there > completing > >> >> vmbus_connection.unload_event, our wait on the current CPU will > never > >> >> end. > >> > > >> > What was the host you were testing on? > >> > > >> > >> I was testing on both 2012R2 and 2016TP4. The bug is easily reproducible > >> by forcing crash on a secondary CPU, e.g.: > > > > Prior to 2012R2, all messages would be delivered on CPU0 and this includes > CHANNELMSG_UNLOAD_RESPONSE. > > For this reason we don't support kexec on pre-2012 R2 hosts. On 2012. > From 2012 R2 on, all vmbus > > messages (responses) will be delivered on the CPU that we initially set up - > look at the code in > > vmbus_negotiate_version(). So on post 2012 R2 hosts, the response to > CHANNELMSG_UNLOAD_RESPONSE > > will be delivered on the CPU where we initiate the contact with the > > host - CHANNELMSG_INITIATE_CONTACT message. > > Unfortunatelly there is a descrepancy between WS2012R2 and WS2016TP4. > On > WS2012R2 what you're saying is true and all messages including > CHANNELMSG_UNLOAD_RESPONSE are delivered to the CPU we used for > initial > contact. On WS2016TP4 CHANNELMSG_UNLOAD_RESPONSE seems to be a > special > case and it is always delivered to CPU0, no matter which CPU we used for > initial contact. This can be a host bug. You can use the attached patch > to see the issue. This looks like a host bug and I will try to get is addressed before ws2016 ships. > > For now I can suggest we check message pages for all CPUs from > vmbus_wait_for_unload(). We can race with other CPUs again but we don't > care as we're checking for completion_done() in the loop as well. I'll > try this approach. Thank you. K. Y > > -- > Vitaly _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel