Re: Nic flaps for 1 minute when reconnecting

Francois <rigault.francois@xxxxxxxxx> · Sat, 20 Apr 2024 11:29:30 +0200

On Fri, 19 Apr 2024 at 20:15, Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> wrote:
>
>
>
> > -----Original Message-----
> > From: Francois <rigault.francois@xxxxxxxxx>
> > Sent: Sunday, April 14, 2024 1:15 PM
> > To: linux-hyperv@xxxxxxxxxxxxxxx
> > Subject: Nic flaps for 1 minute when reconnecting
> >

> > (Nic Connected) sent to Nic
> > 40DBAAF6-D408-452F-BC2E-B76AAF065732--B670C9DF-AB50-49C4-...
> > 24/12/2023 09:39:33             220 Information      Status change
> > (Nic Connected) sent to Nic
> > 40DBAAF6-D408-452F-BC2E-B76AAF065732--B670C9DF-AB50-49C4-...
> > 24/12/2023 09:39:33             220 Information      Status change
> > (Nic Disconnected) sent to Nic
> > 40DBAAF6-D408-452F-BC2E-B76AAF065732--B670C9DF-AB50-49...
> > 24/12/2023 09:39:33             220 Information      Status change
> > (Nic Disconnected) sent to Nic
> > 40DBAAF6-D408-452F-BC2E-B76AAF065732--B670C9DF-AB50-49...
> > Thanks!
> > Francois
>
> The 2 seconds delay is necessary for the upper layers, like link_watch
> infrastructure, and userspace to handle the status change properly.
>

Hi, thanks for your response!
I understand the need to split a "change" event into 2 separate events, I
don't really understand why there needs to be a 2 seconds delay between
each. Surely other network drivers do not artificially add that delay?

In my case a lot of events are received (instead of a single
disconnect/reconnect) and they are all tailed and processed sequentially,
in practice the VM is not usable for a minute or so. It happens "by
surprise", I have no idea what is causing this.

I don't think I have a way to dig into the way Windows or Hyper-V are sending
these events, so I am living with the patch to reduce the delay. What would
you think of first adding a log in this fashion

> --- a/drivers/net/hyperv/netvsc_drv.c   2024-04-20 08:48:09.105928816 +0200
> +++ b/drivers/net/hyperv/netvsc_drv.c   2024-04-20 08:57:28.254412513 +0200
> @@ -2080,6 +2080,10 @@
>         ndev_ctx->last_reconfig = jiffies;
>
>         spin_lock_irqsave(&ndev_ctx->lock, flags);
> +       size_t len = list_count_nodes(&ndev_ctx->reconfig_events);
> +       if (len > 5) {
> +               netdev_warn(net, "handle storm depth=%ld", len);
> +       }
>         if (!list_empty(&ndev_ctx->reconfig_events)) {
>                 event = list_first_entry(&ndev_ctx->reconfig_events,
>                                          struct netvsc_reconfig, list);

to inform the user that something is wrong and events are being stacked
unnecessarily? Hopefully someone will notice and more users would be able to
chime in and report.

>
> Thanks,
> - Haiyang
>

Thanks!
Francois