> From: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> > Sent: Thursday, September 12, 2019 9:37 AM > > +static int util_suspend(struct hv_device *dev) > > +{ > > + struct hv_util_service *srv = hv_get_drvdata(dev); > > + > > + if (srv->util_cancel_work) > > + srv->util_cancel_work(); > > + > > + vmbus_close(dev->channel); > > And what happens if you receive a real reply from userspace after you > close the channel? You've only cancelled timeout work so the driver > will not try to reply by itself but this doesn't mean we won't try to > write to the channel on legitimate replies from daemons. > > I think you need to block all userspace requests (hang in kernel until > util_resume()). > > While I'm not sure we can't get away without it but I'd suggest we add a > new HVUTIL_SUSPENDED state to the hvutil state machine. > Vitaly When we reach util_suspend(), all the userspace processes have been frozen: see kernel/power/hibernate.c: hibernate() -> freeze_processes(), so here we can not receive a reply from the userspace daemon. However, it looks there is indeed some tricky corner cases we need to deal with: in util_resume(), before we call vmbus_open(), all the userspace processes are still frozen, and the userspace daemon (e.g. hv_kvp_daemon) can be in any of these states: 1. the driver has not buffered any message for the daemon. This is good. 2. the driver has buffered a message for the daemon, and kvp_transaction.state is HVUTIL_USERSPACE_REQ. Later, the kvp daemon writes the response to the driver, and in kvp_on_msg() kvp_transaction.state is moved to HVUTIL_USERSPACE_RECV, but since cancel_delayed_work_sync(&kvp_timeout_work) is false (the work has been cancelled by util_suspend()), the driver reports nothing to the host, which is good as the VM has undergone a hibernation event and IMO the response may be stale and I believe the host is not expecting this response from the VM at all (the host side application that requested the KVP must have failed or timed out), but the bad thing is: the "state" remains in HVUTIL_USERSPACE_RECV, preventing hv_kvp_onchannelcallback() from reading from the channel ringbuffer. I suggest we work around this race condition by the below patch: --- a/drivers/hv/hv_kvp.c +++ b/drivers/hv/hv_kvp.c @@ -250,8 +250,8 @@ static int kvp_on_msg(void *msg, int len) */ if (cancel_delayed_work_sync(&kvp_timeout_work)) { kvp_respond_to_host(message, error); - hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper); } + hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper); return 0; } How do you like this? I can imagine there is still a small chance that the state machine can run out of order, and the kvp daemon may exit due to the return values of read()/write() being -EINVAL, but the chance should be small enough in practice, and IMO the issue even exists in the current code when hibernation is not involved, e.g. kvp_timeout_func() and kvp_on_msg() can run concurrently; if kvp_on_msg() runs slowly due to some reason (e.g. the kvp daemon is stopped as I'm gdb'ing it), kvp_timeout_func() fires and moves the state to HVUTIL_READY; later kvp_on_msg() starts to run and returns -EINVAL, and the kvp daemon will exit(). IMHO here it looks extremely difficult to make things flawless (if that's even possible), so it's necessary to ask the daemons to auto-restart once they exit() unexpectedly. This can be achieved by configuring systemd properly for the kvp/vss/fcopy services. I'm not sure introducing a HVUTIL_SUSPENDED state would solve all of the corner cases, but I'm sure that would further complicate the current code, at least to me. :-) Please share your thoughts. I believe you know all of these much better than me, as you're the author of the state machine. :-) Thanks, -- Dexuan