Dexuan Cui <decui@xxxxxxxxxxxxx> writes: >> From: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> >> Sent: Thursday, September 12, 2019 9:37 AM > >> > +static int util_suspend(struct hv_device *dev) >> > +{ >> > + struct hv_util_service *srv = hv_get_drvdata(dev); >> > + >> > + if (srv->util_cancel_work) >> > + srv->util_cancel_work(); >> > + >> > + vmbus_close(dev->channel); >> >> And what happens if you receive a real reply from userspace after you >> close the channel? You've only cancelled timeout work so the driver >> will not try to reply by itself but this doesn't mean we won't try to >> write to the channel on legitimate replies from daemons. >> >> I think you need to block all userspace requests (hang in kernel until >> util_resume()). >> >> While I'm not sure we can't get away without it but I'd suggest we add a >> new HVUTIL_SUSPENDED state to the hvutil state machine. >> Vitaly > > When we reach util_suspend(), all the userspace processes have been > frozen: see kernel/power/hibernate.c: hibernate() -> freeze_processes(), > so here we can not receive a reply from the userspace daemon. > Let's add a WARN() or something then as if this ever happens this is going to be realy tricky to catch. > However, it looks there is indeed some tricky corner cases we need to deal > with: in util_resume(), before we call vmbus_open(), all the userspace > processes are still frozen, and the userspace daemon (e.g. hv_kvp_daemon) > can be in any of these states: > > 1. the driver has not buffered any message for the daemon. This is good. > > 2. the driver has buffered a message for the daemon, and > kvp_transaction.state is HVUTIL_USERSPACE_REQ. Later, the kvp daemon > writes the response to the driver, and in kvp_on_msg() > kvp_transaction.state is moved to HVUTIL_USERSPACE_RECV, but since > cancel_delayed_work_sync(&kvp_timeout_work) is false (the work has > been cancelled by util_suspend()), the driver reports nothing to the host, > which is good as the VM has undergone a hibernation event and IMO the > response may be stale and I believe the host is not expecting this > response from the VM at all (the host side application that requested the > KVP must have failed or timed out), but the bad thing is: the "state" > remains in HVUTIL_USERSPACE_RECV, preventing > hv_kvp_onchannelcallback() from reading from the channel ringbuffer. > > I suggest we work around this race condition by the below patch: > > --- a/drivers/hv/hv_kvp.c > +++ b/drivers/hv/hv_kvp.c > @@ -250,8 +250,8 @@ static int kvp_on_msg(void *msg, int len) > */ > if (cancel_delayed_work_sync(&kvp_timeout_work)) { > kvp_respond_to_host(message, error); > - hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper); > } > + hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper); > > return 0; > } > > How do you like this? > Is it safe to call hv_poll_channel() simultaneously on several CPUs? It seems it is as we're doing smp_call_function_single(channel->target_cpu, cb, channel, true); (I'm asking because if it's not, than doing what you suggest will open the following window: timeout function kvp_timeout_func() is already running but the daemon is trying to reply at the same moment). > I can imagine there is still a small chance that the state machine can run > out of order, and the kvp daemon may exit due to the return values of > read()/write() being -EINVAL, but the chance should be small enough in > practice, and IMO the issue even exists in the current code when > hibernation is not involved, e.g. kvp_timeout_func() and kvp_on_msg() > can run concurrently; if kvp_on_msg() runs slowly due to some reason > (e.g. the kvp daemon is stopped as I'm gdb'ing it), kvp_timeout_func() > fires and moves the state to HVUTIL_READY; later kvp_on_msg() starts > to run and returns -EINVAL, and the kvp daemon will exit(). > > IMHO here it looks extremely difficult to make things flawless (if that's > even possible), so it's necessary to ask the daemons to auto-restart once > they exit() unexpectedly. This can be achieved by configuring systemd > properly for the kvp/vss/fcopy services. I think we can also teach daemons to ignore -EINVAL or switch to something like -EAGAIN in non-fatal cases. > > I'm not sure introducing a HVUTIL_SUSPENDED state would solve all > of the corner cases, but I'm sure that would further complicate the > current code, at least to me. :-) > Maybe, if we don't need it than we don't. Basically, I see the only advantage in having such state: it makes our tricks to support hibernation more visible in the code. -- Vitaly