> From: Michael Kelley <mikelley@xxxxxxxxxxxxx> > Sent: Wednesday, January 22, 2020 10:28 AM > ... > > +int hv_fcopy_pre_suspend(void) > > +{ > > + struct vmbus_channel *channel = fcopy_transaction.recv_channel; > > + struct hv_fcopy_hdr *fcopy_msg; > > + > > + tasklet_disable(&channel->callback_event); > > + ... > > + fcopy_msg = kzalloc(sizeof(*fcopy_msg), GFP_KERNEL); > > + if (!fcopy_msg) > > + goto err; > > + > > + fcopy_msg->operation = CANCEL_FCOPY; > > + > > + hv_fcopy_cancel_work(); > > + > > + /* We don't care about the return value. */ > > + hvutil_transport_send(hvt, fcopy_msg, sizeof(*fcopy_msg), NULL); > > + > > + kfree(fcopy_msg); > > + > > + fcopy_transaction.state = HVUTIL_READY; > > Is the ordering correct here? This is intentional. I'll add a comment (please see the below). > It seems like the fcopy daemon could receive > the cancel message and do the write before the state is forced to > HVUTIL_READY. This can not happen, because when we're here from util_suspend(), all the userspace processes have been frozen (please refer to another mail from me https://lkml.org/lkml/2020/1/13/1021). The userspace is thawed only after util_resume() and the other part of the resume procedure finish. When we're here in util_suspend(), we can be in any of the below states: #1: hv_utils has not queued any message to the userspace daemon. Now hv_fcopy_pre_suspend() queues a message to the daemon, and forces the state to HVUTIL_READY; the daemon should read the message without any error; later when the daemon calls write(), the write() returns -1 because fcopy_transaction.state != HVUTIL_USERSPACE_REQ and fcopy_on_msg() returns -EINVAL; the daemon responds to the write() error by closing and re-opening the dev file, which triggers a reset in the hv_utils driver: see hvt_op_release() -> hvt_reset() -> fcopy_on_reset(), and later the daemon registers itself to the hv_utils driver, and everything comes back to normal. #2: hv_utils has queued a message to the userspace daemon. Now hv_fcopy_pre_suspend() fails to queue an extra message to the daemon, but still forces the state to HVUTIL_READY. #2.1 the userspace has not read the message. The daemon reads the queued message and later the write() fails, so the daemon closes and re-opens the dev file. #2.2 the userspace has read the message, but has not called write() yet. The write() fails, so the daemon closes and re-opens the dev file. #2.3 the userspace has read the message, and has called write(). This is actualy the same as #1. So, IMO the patch should be handling the scenarios correctly. > > + > > + /* tasklet_enable() will be called in hv_fcopy_pre_resume(). */ > > + > > + return 0; > > +err: > > + tasklet_enable(&channel->callback_event); > > A nit, but if you did the memory allocation first, you could return -ENOMEM > immediately on error and avoid the err: label and re-enabling the tasklet. Good idea! I'll fix this. > > + return -ENOMEM; > > +} > > ... > > --- a/drivers/hv/hv_snapshot.c > > +++ b/drivers/hv/hv_snapshot.c > > @@ -229,6 +229,7 @@ static void vss_handle_request(struct work_struct > *dummy) > > vss_transaction.state = HVUTIL_HOSTMSG_RECEIVED; > > vss_send_op(); > > return; > > + > > Gratuitous blank line added? Will remove it. I probably tried to make the "return;" more noticeable. > > case VSS_OP_GET_DM_INFO: > > vss_transaction.msg->dm_info.flags = 0; > > break; > > ... > > +int hv_vss_pre_suspend(void) > > +{ > > ... > > + /* We don't care about the return value. */ > > + hvutil_transport_send(hvt, vss_msg, sizeof(*vss_msg), NULL); > > + > > + kfree(vss_msg); > > + > > + vss_transaction.state = HVUTIL_READY; > > Same comment about the ordering. I'll add a comment for this. I may add a long comment in util_suspend() and add a short comment here. > > + > > + /* tasklet_enable() will be called in hv_vss_pre_resume(). */ > > + > > + return 0; > > +err: > > + tasklet_enable(&channel->callback_event); > > + return -ENOMEM; > > Same comment about simplifying the error handling applies here. Will fix this. > > +static int util_suspend(struct hv_device *dev) > > +{ > > + struct hv_util_service *srv = hv_get_drvdata(dev); > > + int ret = 0; > > + > > + if (srv->util_pre_suspend) { > > + ret = srv->util_pre_suspend(); > > + > > Unneeded blank line? Will remove this. > > + if (ret) > > + return ret; > > + } > > + > > + vmbus_close(dev->channel); > > + > > + return 0; > > +} > > + > > +static int util_resume(struct hv_device *dev) > > +{ > > + struct hv_util_service *srv = hv_get_drvdata(dev); > > + int ret = 0; > > + > > + if (srv->util_pre_resume) { > > + ret = srv->util_pre_resume(); > > + > > Unneeded blank line? Will remove this. Thanks, -- Dexuan