> From: Leonid Bloch <leonidb@xxxxxxxxxxxxxx> > Sent: Friday, June 4, 2021 1:14 AM > To: Dexuan Cui <decui@xxxxxxxxxxxxx>; KY Srinivasan <kys@xxxxxxxxxxxxx>; > Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>; Stephen Hemminger > <sthemmin@xxxxxxxxxxxxx>; Wei Liu <wei.liu@xxxxxxxxxx>; Long Li > <longli@xxxxxxxxxxxxx> > Cc: linux-hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx > Subject: Re: [BUG] hv_netvsc: Unbind exits before the VFs bound to it are > unregistered > > On 6/3/21 9:04 PM, Dexuan Cui wrote: > >> From: Leonid Bloch <leonidb@xxxxxxxxxxxxxx> > >> Sent: Thursday, June 3, 2021 5:35 AM > >> To: KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang > >> <haiyangz@xxxxxxxxxxxxx>; Stephen Hemminger > >> <sthemmin@xxxxxxxxxxxxx>; Wei Liu <wei.liu@xxxxxxxxxx>; Dexuan Cui > >> <decui@xxxxxxxxxxxxx> > >> Cc: linux-hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx > >> Subject: [BUG] hv_netvsc: Unbind exits before the VFs bound to it are > >> unregistered > >> > >> Hi, > >> > >> When I try to unbind a network interface from hv_netvsc and bind it to > >> uio_hv_generic, once in a while I get the following kernel panic (please > >> note the first two lines: it seems as uio_hv_generic is registered > >> before the VF bound to hv_netvsc is unregistered): > >> > >> [Jun 3 09:04] hv_vmbus: registering driver uio_hv_generic > >> [ +0.002215] hv_netvsc 5e089342-8a78-4b76-9729-25c81bd338fc eth2: > VF > >> unregistering: eth5 > >> [ +1.088078] BUG: scheduling while atomic: swapper/8/0/0x00010003 > >> [ +0.000001] BUG: scheduling while atomic: swapper/3/0/0x00010003 > >> [ +0.000001] BUG: scheduling while atomic: swapper/6/0/0x00010003 > >> [ +0.000000] BUG: scheduling while atomic: swapper/7/0/0x00010003 > >> [ +0.000005] Modules linked in: > >> [ +0.000001] Modules linked in: > >> [ +0.000001] uio_hv_generic > >> [ +0.000000] Modules linked in: > >> [ +0.000000] Modules linked in: > >> [ +0.000001] uio_hv_generic uio > >> [ +0.000001] uio > >> [ +0.000000] uio_hv_generic > >> [ +0.000000] uio_hv_generic > >> ... > >> > >> I run kernel 5.10.27, unmodified, besides RT patch v36, on Azure Stack > >> Edge platform, software version 2105 (2.2.1606.3320). > >> > >> I perform the bind-unbind using the following script (please note the > >> comment inline): > >> > >> net_uuid="f8615163-df3e-46c5-913f-f2d2f965ed0e" > >> dev_uuid="$(basename "$(readlink "/sys/class/net/eth1/device")")" > >> modprobe uio_hv_generic > >> echo "${net_uuid}" > /sys/bus/vmbus/drivers/uio_hv_generic/new_id > >> printf "%s" "${dev_uuid}" > /sys/bus/vmbus/drivers/hv_netvsc/unbind > >> ### If I insert 'sleep 1' here - all works correctly > >> printf "%s" "${dev_uuid}" > /sys/bus/vmbus/drivers/uio_hv_generic/bind > >> > >> > >> Thanks, > >> Leonid. > > > > It would be great if you can test the mainline kernel, which I suspect also > > has the bug. > > > > It looks like netvsc_remove() -> netvsc_unregister_vf() does the unbinding > work > > in a synchronous mannter. I don't know why the bug happens. > > > > Right now I don't have a DPDK setup to test this, but I think the bug can > > be worked around by unbinding the PCI VF device from the pci-hyperv driver > > before unbinding the netvsc device, and re-binding the VF device after > binding > > the netvsc device to uio_hv_generic. > > > > Thanks, > > -- Dexuan > > > > Hi Dexuan, > > Thanks for your reply. I can check for myself only next week, as I am > out of office now, but do you think that the reason might be using > cancel_delayed_work_sync(), instead of cancel_delayed_work() in > netvsc_unregister_vf()? I'm not sure. I don't understand how the error happens: [ +1.088078] BUG: scheduling while atomic: swapper/8/0/0x00010003 > And if the above is not correct, can you please advise on a way of > finding the corresponding VF device from userspace, given the kernel > name of the parent device? I did not find it in sysfs so far. > > Thanks, > Leonid. The VF NIC interface's MAC address is the same as the that of the matching netvsc NIC. We should be able to find the <netvsc NIC, VF NIC> pair by checking /sys/class/net/*/address.