> From: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> > Sent: Monday, September 16, 2019 1:46 AM > Dexuan Cui <decui@xxxxxxxxxxxxx> writes: > > >> From: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> > >> Sent: Thursday, September 12, 2019 9:37 AM > > > >> > +static int util_suspend(struct hv_device *dev) > >> > +{ > >> > + struct hv_util_service *srv = hv_get_drvdata(dev); > >> > + > >> > + if (srv->util_cancel_work) > >> > + srv->util_cancel_work(); > >> > + > >> > + vmbus_close(dev->channel); > >> > >> And what happens if you receive a real reply from userspace after you > >> close the channel? You've only cancelled timeout work so the driver > >> will not try to reply by itself but this doesn't mean we won't try to > >> write to the channel on legitimate replies from daemons. > >> > >> I think you need to block all userspace requests (hang in kernel until > >> util_resume()). > >> > >> While I'm not sure we can't get away without it but I'd suggest we add a > >> new HVUTIL_SUSPENDED state to the hvutil state machine. > >> Vitaly > > > > When we reach util_suspend(), all the userspace processes have been > > frozen: see kernel/power/hibernate.c: hibernate() -> freeze_processes(), > > so here we can not receive a reply from the userspace daemon. > > > > Let's add a WARN() or something then as if this ever happens this is > going to be realy tricky to catch. Looking at the path hibernate() -> freeze_processes() -> try_to_freeze_tasks(true) -> freeze_task() -> fake_signal_wake_up(), I'm sure when try_to_freeze_tasks(true) returns 0, all the user-space processes must be frozen in do_signal() -> get_signal() -> try_to_freeze() -> ... -> __refrigerator(). hibernate () -> hibernation_snapshot () -> dpm_suspend() -> ... -> util_suspend() only runs after hibernate() -> freeze_processes(), so I'm pretty sure we're safe here. > > However, it looks there is indeed some tricky corner cases we need to deal > > with: in util_resume(), before we call vmbus_open(), all the userspace > > processes are still frozen, and the userspace daemon (e.g. hv_kvp_daemon) > > can be in any of these states: > > > > 1. the driver has not buffered any message for the daemon. This is good. > > > > 2. the driver has buffered a message for the daemon, and > > kvp_transaction.state is HVUTIL_USERSPACE_REQ. Later, the kvp daemon > > writes the response to the driver, and in kvp_on_msg() > > kvp_transaction.state is moved to HVUTIL_USERSPACE_RECV, but since > > cancel_delayed_work_sync(&kvp_timeout_work) is false (the work has > > been cancelled by util_suspend()), the driver reports nothing to the host, > > which is good as the VM has undergone a hibernation event and IMO the > > response may be stale and I believe the host is not expecting this > > response from the VM at all (the host side application that requested the > > KVP must have failed or timed out), but the bad thing is: the "state" > > remains in HVUTIL_USERSPACE_RECV, preventing > > hv_kvp_onchannelcallback() from reading from the channel ringbuffer. > > > > I suggest we work around this race condition by the below patch: > > > > --- a/drivers/hv/hv_kvp.c > > +++ b/drivers/hv/hv_kvp.c > > @@ -250,8 +250,8 @@ static int kvp_on_msg(void *msg, int len) > > */ > > if (cancel_delayed_work_sync(&kvp_timeout_work)) { > > kvp_respond_to_host(message, error); > > - hv_poll_channel(kvp_transaction.recv_channel, > kvp_poll_wrapper); > > } > > + hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper); > > > > return 0; > > } > > > > How do you like this? > > > > Is it safe to call hv_poll_channel() simultaneously on several CPUs? It > seems it is as we're doing > > smp_call_function_single(channel->target_cpu, cb, channel, true); Looks safe to me. The code has been there for years. :-) > (I'm asking because if it's not, than doing what you suggest will open > the following window: timeout function kvp_timeout_func() is already > running but the daemon is trying to reply at the same moment). > > > I can imagine there is still a small chance that the state machine can run > > out of order, and the kvp daemon may exit due to the return values of > > read()/write() being -EINVAL, but the chance should be small enough in > > practice, and IMO the issue even exists in the current code when > > hibernation is not involved, e.g. kvp_timeout_func() and kvp_on_msg() > > can run concurrently; if kvp_on_msg() runs slowly due to some reason > > (e.g. the kvp daemon is stopped as I'm gdb'ing it), kvp_timeout_func() > > fires and moves the state to HVUTIL_READY; later kvp_on_msg() starts > > to run and returns -EINVAL, and the kvp daemon will exit(). > > > > IMHO here it looks extremely difficult to make things flawless (if that's > > even possible), so it's necessary to ask the daemons to auto-restart once > > they exit() unexpectedly. This can be achieved by configuring systemd > > properly for the kvp/vss/fcopy services. > > I think we can also teach daemons to ignore -EINVAL or switch to > something like -EAGAIN in non-fatal cases. Maybe the driver should return 0 rather than -EINVAL for the kvp daemon in some scenarios, since kvp is never 100% reliable. > > I'm not sure introducing a HVUTIL_SUSPENDED state would solve all > > of the corner cases, but I'm sure that would further complicate the > > current code, at least to me. :-) > > > > Maybe, if we don't need it than we don't. Basically, I see the only > advantage in having such state: it makes our tricks to support > hibernation more visible in the code. > Vitaly Let me think about this. BTW, for vss, maybe the VM should not hibernate if there is a backup ongoing? -- if the file system is frozen by hv_vss_daemon, and the VM hibernates, then when the VM resumes back, it's almost always true that the VM won't receive the host's VSS_OP_THAW request, and the VM will end up in an unusable state. Thanks, -- Dexuan