On Tue, Apr 20, 2021 at 11:31:54AM +0200, Vitaly Kuznetsov wrote: > Michael Kelley <mikelley@xxxxxxxxxxxxx> writes: > > > When running in Azure, disks may be connected to a Linux VM with > > read/write caching enabled. If a VM panics and issues a VMbus > > UNLOAD request to Hyper-V, the response is delayed until all dirty > > data in the disk cache is flushed. In extreme cases, this flushing > > can take 10's of seconds, depending on the disk speed and the amount > > of dirty data. If kdump is configured for the VM, the current 10 second > > timeout in vmbus_wait_for_unload() may be exceeded, and the UNLOAD > > complete message may arrive well after the kdump kernel is already > > running, causing problems. Note that no problem occurs if kdump is > > not enabled because Hyper-V waits for the cache flush before doing > > a reboot through the BIOS/UEFI code. > > > > Fix this problem by increasing the timeout in vmbus_wait_for_unload() > > to 100 seconds. Also output periodic messages so that if anyone is > > watching the serial console, they won't think the VM is completely > > hung. > > > > Fixes: 911e1987efc8 ("Drivers: hv: vmbus: Add timeout to vmbus_wait_for_unload") > > Signed-off-by: Michael Kelley <mikelley@xxxxxxxxxxxxx> Applied to hyperv-next. Thanks. > > --- [...] > > > > +#define UNLOAD_DELAY_UNIT_MS 10 /* 10 milliseconds */ > > +#define UNLOAD_WAIT_MS (100*1000) /* 100 seconds */ > > +#define UNLOAD_WAIT_LOOPS (UNLOAD_WAIT_MS/UNLOAD_DELAY_UNIT_MS) > > +#define UNLOAD_MSG_MS (5*1000) /* Every 5 seconds */ > > +#define UNLOAD_MSG_LOOPS (UNLOAD_MSG_MS/UNLOAD_DELAY_UNIT_MS) > > + > > static void vmbus_wait_for_unload(void) > > { > > int cpu; > > @@ -772,12 +778,17 @@ static void vmbus_wait_for_unload(void) > > * vmbus_connection.unload_event. If not, the last thing we can do is > > * read message pages for all CPUs directly. > > * > > - * Wait no more than 10 seconds so that the panic path can't get > > - * hung forever in case the response message isn't seen. > > + * Wait up to 100 seconds since an Azure host must writeback any dirty > > + * data in its disk cache before the VMbus UNLOAD request will > > + * complete. This flushing has been empirically observed to take up > > + * to 50 seconds in cases with a lot of dirty data, so allow additional > > + * leeway and for inaccuracies in mdelay(). But eventually time out so > > + * that the panic path can't get hung forever in case the response > > + * message isn't seen. > > I vaguely remember debugging cases when CHANNELMSG_UNLOAD_RESPONSE never > arrives, it was kind of pointless to proceed to kexec as attempts to > reconnect Vmbus devices were failing (no devices were offered after > CHANNELMSG_REQUESTOFFERS AFAIR). Would it maybe make sense to just do > emergency reboot instead of proceeding to kexec when this happens? Just > wondering. > Please submit a follow-up patch if necessary. Wei.