On 9/30/2022 6:33 PM, Haiyang Zhang wrote:
-----Original Message-----
From: Jakub Kicinski <kuba@xxxxxxxxxx>
Sent: Thursday, September 29, 2022 10:26 PM
To: Gaurav Kohli <gauravkohli@xxxxxxxxxxxxxxxxxxx>
Cc: KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang
<haiyangz@xxxxxxxxxxxxx>; Stephen Hemminger
<sthemmin@xxxxxxxxxxxxx>; wei.liu@xxxxxxxxxx; Dexuan Cui
<decui@xxxxxxxxxxxxx>; linux-hyperv@xxxxxxxxxxxxxxx;
netdev@xxxxxxxxxxxxxxx
Subject: Re: [PATCH net] hv_netvsc: Fix race between VF offering and VF
association message from host
On Wed, 28 Sep 2022 06:48:33 -0700 Gaurav Kohli wrote:
During vm boot, there might be possibility that vf registration
call comes before the vf association from host to vm.
And this might break netvsc vf path, To prevent the same block
vf registration until vf bind message comes from host.
Cc: stable@xxxxxxxxxxxxxxx
Fixes: 00d7ddba11436 ("hv_netvsc: pair VF based on serial number")
Signed-off-by: Gaurav Kohli <gauravkohli@xxxxxxxxxxxxxxxxxxx>
Is it possible to add a timeout or such? Waiting for an external
event while holding rtnl lock seems a little scary.
We used to have time-out in many places of this driver. But there is
no protocol guarantees of the host response time, so the time out value
cannot be set. These time-outs were removed several years ago.
The other question is - what protects the completion and ->vf_alloc
from races? Is there some locking? ->vf_alloc only goes from 0 to 1
and never back?
Thanks for the comment, i understand your concern for vf_alloc and
reinit completion part, I think
we can move reinit completion to unregistration part of vf code.
Let me send v2 patch.
When Vf is removed, the vf_assoc msg will set it to 0 here:
net_device_ctx->vf_alloc = nvmsg->msg.v4_msg.vf_assoc.allocated;
net_device_ctx->vf_serial = nvmsg->msg.v4_msg.vf_assoc.serial;
Also, I think this condition can be changed from:
+ if (vf_is_up && !net_device_ctx->vf_alloc) {
Thanks for the comment.
This is needed to maintain state machine, as netvsc change event can
comes multiple time. That's why i have put
extra check to avoid any deadlock.
to:
+ if (vf_is_up) {
So when VF comes up, it always wait for the completion without depending
on the vf_alloc.
Thanks,
- Haiyang