From: Dexuan Cui <decui@xxxxxxxxxxxxx> Sent: Thursday, January 23, 2020 12:12 AM > > > From: Michael Kelley <mikelley@xxxxxxxxxxxxx> > > Sent: Wednesday, January 22, 2020 10:28 AM > > ... > > > +int hv_fcopy_pre_suspend(void) > > > +{ > > > + struct vmbus_channel *channel = fcopy_transaction.recv_channel; > > > + struct hv_fcopy_hdr *fcopy_msg; > > > + > > > + tasklet_disable(&channel->callback_event); > > > + ... > > > + fcopy_msg = kzalloc(sizeof(*fcopy_msg), GFP_KERNEL); > > > + if (!fcopy_msg) > > > + goto err; > > > + > > > + fcopy_msg->operation = CANCEL_FCOPY; > > > + > > > + hv_fcopy_cancel_work(); > > > + > > > + /* We don't care about the return value. */ > > > + hvutil_transport_send(hvt, fcopy_msg, sizeof(*fcopy_msg), NULL); > > > + > > > + kfree(fcopy_msg); > > > + > > > + fcopy_transaction.state = HVUTIL_READY; > > > > Is the ordering correct here? > > This is intentional. I'll add a comment (please see the below). > > > It seems like the fcopy daemon could receive > > the cancel message and do the write before the state is forced to > > HVUTIL_READY. > > This can not happen, because when we're here from util_suspend(), all the > userspace processes have been frozen (please refer to another mail from me > https://lkml.org/lkml/2020/1/13/1021). The userspace is thawed only after > util_resume() and the other part of the resume procedure finish. Oh, right. That makes sense now. > > When we're here in util_suspend(), we can be in any of the below states: > > #1: hv_utils has not queued any message to the userspace daemon. > Now hv_fcopy_pre_suspend() queues a message to the daemon, and forces > the state to HVUTIL_READY; the daemon should read the message without > any error; later when the daemon calls write(), the write() returns -1 because > fcopy_transaction.state != HVUTIL_USERSPACE_REQ and fcopy_on_msg() > returns -EINVAL; the daemon responds to the write() error by closing and > re-opening the dev file, which triggers a reset in the hv_utils driver: see > hvt_op_release() -> hvt_reset() -> fcopy_on_reset(), and later the daemon > registers itself to the hv_utils driver, and everything comes back to normal. > > #2: hv_utils has queued a message to the userspace daemon. > Now hv_fcopy_pre_suspend() fails to queue an extra message to the > daemon, but still forces the state to HVUTIL_READY. > > #2.1 the userspace has not read the message. > The daemon reads the queued message and later the write() fails, so the > daemon closes and re-opens the dev file. > > #2.2 the userspace has read the message, but has not called write() yet. > The write() fails, so the daemon closes and re-opens the dev file. > > #2.3 the userspace has read the message, and has called write(). > This is actualy the same as #1. > > So, IMO the patch should be handling the scenarios correctly. > > > > + > > > + /* tasklet_enable() will be called in hv_fcopy_pre_resume(). */ > > > + > > > + return 0; > > > +err: > > > + tasklet_enable(&channel->callback_event); > > > > A nit, but if you did the memory allocation first, you could return -ENOMEM > > immediately on error and avoid the err: label and re-enabling the tasklet. > > Good idea! I'll fix this. > > > > + return -ENOMEM; > > > +} > > > ... > > > --- a/drivers/hv/hv_snapshot.c > > > +++ b/drivers/hv/hv_snapshot.c > > > @@ -229,6 +229,7 @@ static void vss_handle_request(struct work_struct > > *dummy) > > > vss_transaction.state = HVUTIL_HOSTMSG_RECEIVED; > > > vss_send_op(); > > > return; > > > + > > > > Gratuitous blank line added? > > Will remove it. I probably tried to make the "return;" more noticeable. > > > > case VSS_OP_GET_DM_INFO: > > > vss_transaction.msg->dm_info.flags = 0; > > > break; > > > ... > > > +int hv_vss_pre_suspend(void) > > > +{ > > > ... > > > + /* We don't care about the return value. */ > > > + hvutil_transport_send(hvt, vss_msg, sizeof(*vss_msg), NULL); > > > + > > > + kfree(vss_msg); > > > + > > > + vss_transaction.state = HVUTIL_READY; > > > > Same comment about the ordering. > > I'll add a comment for this. I may add a long comment in util_suspend() > and add a short comment here. > > > > + > > > + /* tasklet_enable() will be called in hv_vss_pre_resume(). */ > > > + > > > + return 0; > > > +err: > > > + tasklet_enable(&channel->callback_event); > > > + return -ENOMEM; > > > > Same comment about simplifying the error handling applies here. > > Will fix this. > > > > +static int util_suspend(struct hv_device *dev) > > > +{ > > > + struct hv_util_service *srv = hv_get_drvdata(dev); > > > + int ret = 0; > > > + > > > + if (srv->util_pre_suspend) { > > > + ret = srv->util_pre_suspend(); > > > + > > > > Unneeded blank line? > > Will remove this. > > > > + if (ret) > > > + return ret; > > > + } > > > + > > > + vmbus_close(dev->channel); > > > + > > > + return 0; > > > +} > > > + > > > +static int util_resume(struct hv_device *dev) > > > +{ > > > + struct hv_util_service *srv = hv_get_drvdata(dev); > > > + int ret = 0; > > > + > > > + if (srv->util_pre_resume) { > > > + ret = srv->util_pre_resume(); > > > + > > > > Unneeded blank line? > > Will remove this. > > Thanks, > -- Dexuan