RE: [PATCH 1/3] hv_utils: Add the support of hibernation

Dexuan Cui <decui@xxxxxxxxxxxxx> · Fri, 13 Sep 2019 19:15:30 +0000

> From: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> Sent: Thursday, September 12, 2019 9:37 AM

> > +static int util_suspend(struct hv_device *dev)
> > +{
> > +	struct hv_util_service *srv = hv_get_drvdata(dev);
> > +
> > +	if (srv->util_cancel_work)
> > +		srv->util_cancel_work();
> > +
> > +	vmbus_close(dev->channel);
> 
> And what happens if you receive a real reply from userspace after you
> close the channel? You've only cancelled timeout work so the driver
> will not try to reply by itself but this doesn't mean we won't try to
> write to the channel on legitimate replies from daemons.
> 
> I think you need to block all userspace requests (hang in kernel until
> util_resume()).
> 
> While I'm not sure we can't get away without it but I'd suggest we add a
> new HVUTIL_SUSPENDED state to the hvutil state machine.
> Vitaly

When we reach util_suspend(), all the userspace processes have been
frozen: see kernel/power/hibernate.c: hibernate() -> freeze_processes(),
so here we can not receive a reply from the userspace daemon.

However, it looks there is indeed some tricky corner cases we need to deal
with: in util_resume(), before we call vmbus_open(), all the userspace
processes are still frozen, and the userspace daemon (e.g. hv_kvp_daemon) 
can be in any of these states:

1. the driver has not buffered any message for the daemon. This is good.

2. the driver has buffered a message for the daemon, and
kvp_transaction.state is HVUTIL_USERSPACE_REQ. Later, the kvp daemon
writes the response to the driver, and in kvp_on_msg() 
kvp_transaction.state is moved to HVUTIL_USERSPACE_RECV, but since
cancel_delayed_work_sync(&kvp_timeout_work) is false (the work has
been cancelled by util_suspend()), the driver reports nothing to the host,
which is good as the VM has undergone a hibernation event and IMO the
response may be stale and I believe the host is not expecting this 
response from the VM at all (the host side application that requested the
KVP must have failed or timed out), but the bad thing is: the "state"
remains in HVUTIL_USERSPACE_RECV, preventing
hv_kvp_onchannelcallback() from reading from the channel ringbuffer.

I suggest we work around this race condition by the below patch:

--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -250,8 +250,8 @@ static int kvp_on_msg(void *msg, int len)
         */
        if (cancel_delayed_work_sync(&kvp_timeout_work)) {
                kvp_respond_to_host(message, error);
-               hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper);
        }
+       hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper);

        return 0;
 }

How do you like this?

I can imagine there is still a small chance that the state machine can run
out of order, and the kvp daemon may exit due to the return values of
read()/write() being -EINVAL, but the chance should be small enough in
practice, and IMO the issue even exists in the current code when
hibernation is not involved, e.g. kvp_timeout_func() and kvp_on_msg() 
can run concurrently; if kvp_on_msg() runs slowly due to some reason 
(e.g. the kvp daemon is stopped as I'm gdb'ing it), kvp_timeout_func()
fires and moves the state to HVUTIL_READY; later kvp_on_msg() starts
to run and returns -EINVAL, and the kvp daemon will exit().

IMHO here it looks extremely difficult to make things flawless (if that's
even possible), so it's necessary to ask the daemons to auto-restart once
they exit() unexpectedly. This can be achieved by configuring systemd
properly for the kvp/vss/fcopy services. 

I'm not sure introducing a HVUTIL_SUSPENDED state would solve all
of the corner cases, but I'm sure that would further complicate the
current code, at least to me. :-)

Please share your thoughts. I believe you know all of these much
better than me, as you're the author of the state machine. :-)

Thanks,
-- Dexuan