> From: Michael Kelley <mikelley@xxxxxxxxxxxxx> > Sent: Tuesday, September 8, 2020 1:49 PM > > @@ -2635,6 +2632,15 @@ static int netvsc_resume(struct hv_device *dev) > > netvsc_devinfo_put(device_info); > > net_device_ctx->saved_netvsc_dev_info = NULL; > > > > + /* A NIC driver (e.g. mlx5) may keep the VF network interface across > > + * hibernation, but here the data path is implicitly switched to the > > + * netvsc NIC since the vmbus channel is closed and re-opened, so > > + * netvsc_vf_changed() must be used to switch the data path to the VF. > > + */ > > + vf_netdev = rtnl_dereference(net_device_ctx->vf_netdev); > > + if (vf_netdev && netvsc_vf_changed(vf_netdev) != NOTIFY_OK) > > + ret = -EINVAL; > > + > > I'm a little late looking at this code. But a question: Is it possible for > netvsc_resume() to be called before the VF driver's resume function > is called? Yes, actually this is what happens 100% of the time. :-) Upon suspend: 1. the mlx5 driver suspends the VF NIC. 2. the pci-hyperv suspends the VF vmbus device, including closing the channel. 3. hv_netvsc suspends the netvsc vmbus device, including closing the channel. Note: between 1) and 3), the data path is still the mlx5 VF, but obviously the VF NIC is not working. IMO this is not an issue in practice, since typically we don't really expect this to work once the suspending starts. Upon resume: 1. hv_netvsc resumes the netvsc vmbus device, including opening the channel. At this time, the data path should be the netvsc NIC since we close and re-open the netvsc vmbus channel, and I believe the default data path is netvsc. With this patch, the data path is switched to the VF NIC in netvsc_resume() because "netif_running(vf_netdev)" is true for the mlx5 VF NIC (CX-4), though the VF NIC device is not resumed back yet. According to my test, I believe this switching succeeds. Note: when the host receives the VM's NVSP_MSG4_TYPE_SWITCH_DATA_PATH request, it looks the host doesn't check if the VF vmbus device and the VF PCI device are really "activated"[1], and it looks the host simply programs the FPGA GFT for the newly-requested data path, and the host doesn't send a response message to the VM, indicating if the switching is a success or a failure. So, at this time, any incoming Ethernet packets (except the broadcast/multicast and TCP SYN packets, which always go through the netvsc NIC on Azure) can not be received by the VF NIC, which has not been resumed yet. IMO this is not an issue in practice, since typically we don't really expect this to work before the resuming is fully finished. BTW, at this time the userspace is not thawed yet, so no application can transmit packets. 2. pci-hyperv resumes the VF vmbus device, including opening the channel. 3. the mlx5 driver resumes the VF NIC, and now everything is backed to normal. [1] In the future, if the host starts to check if the VF vmbus/PCI devices are activated upon the receipt of the VM's NVSP_MSG4_TYPE_SWITCH_DATA_PATH request, and refuses to switch the data path if they're not activated, then we'll be in trouble, but even if that happens, this patch itself doesn't make the situation worse. So ideally we need a mechanism to switch the data path to the mlx5 VF NIC *after* the mlx5 NIC is resumed. Unluckily it looks there is not a standard notification mechanism for this since the mlx5 driver preserves the VF network interface. I'll discuss with the Mellanox developer who implemented mlx5 hibernation support, and probably mlx5 should also destroy/re-create the VF network interface across hibernation, just as the mlx4 driver does. > If so, is it possible for netvsc_vf_changed() to find that the VF > is not up, and hence to switch the data path away from the VF instead of > to the VF? > > Michael When we are in netvsc_resume(), in my test "netif_running(vf_netdev)" is always true for the mlx5 VF NIC (CX-4), so netvsc_vf_changed() should switch the data path to the VF. static inline bool netif_running(const struct net_device *dev) { return test_bit(__LINK_STATE_START, &dev->state); } The flag __LINK_STATE_START is only cleared when the NIC is brought "DOWN", and that case is already automatically handled by netvsc_netdev_event(). For mlx4 (CX-3), net_device_ctx->vf_netdev is NULL in netvsc_resume(), so the CX-3 VF NIC is not affected by this patch. Thanks, -- Dexuan