On Fri, Jul 26, 2024 at 3:24 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > > On Wed, Jul 10, 2024 at 11:03:42AM +0800, Jason Wang wrote: > > On Tue, Jul 9, 2024 at 9:28 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > > > > > > On Tue, Jul 09, 2024 at 04:02:14PM +0800, Jason Wang wrote: > > > > This patch synchronize operstate with admin state per RFC2863. > > > > > > > > This is done by trying to toggle the carrier upon open/close and > > > > synchronize with the config change work. This allows propagate status > > > > correctly to stacked devices like: > > > > > > > > ip link add link enp0s3 macvlan0 type macvlan > > > > ip link set link enp0s3 down > > > > ip link show > > > > > > > > Before this patch: > > > > > > > > 3: enp0s3: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 > > > > link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff > > > > ...... > > > > 5: macvlan0@enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 > > > > link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff > > > > > > > > After this patch: > > > > > > > > 3: enp0s3: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 > > > > link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff > > > > ... > > > > 5: macvlan0@enp0s3: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000 > > > > link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff > > > > > > I think that the commit log is confusing. It seems to say that > > > the issue fixed is synchronizing state with hardware > > > config change. > > > But your example does not show any > > > hardware change. Isn't this example really just > > > a side effect of setting carrier off on close? > > > > The main goal for this patch is to make virtio-net follow RFC2863. The > > main thing that is missed is to synchronize the operstate with admin > > state, if we do this, we get several good results, one of the obvious > > one is to allow virtio-net to propagate status to the upper layer, for > > example if the admin state of the lower virtio-net is down it should > > be propagated to the macvlan on top, so I give the example of using a > > stacked device. I'm not we had others but the commit log is probably > > too small to say all of it. > > > > > > > > > > > > Cc: Venkat Venkatsubra <venkat.x.venkatsubra@xxxxxxxxxx> > > > > Cc: Gia-Khanh Nguyen <gia-khanh.nguyen@xxxxxxxxxx> > > > > Signed-off-by: Jason Wang <jasowang@xxxxxxxxxx> > > > > > > Yes but this just forces lots of re-reads of config on each > > > open/close for no good reason. > > > > Does it really harm? Technically the link status could be changed > > several times when the admin state is down as well. > > It's a bunch of extra vmexits on each VM boot, yes. A new version is ready, will be posted to net-next soon. > > > > Config interrupt is handled in core, you can read once > > > on probe and then handle config changes. > > > > Per RFC2863, the code tries to avoid dealing with any operstate change > > via config space read when the admin state is down. > > what exactly in RFC2863 are you referring to? This? > (2) if ifAdminStatus is down, then ifOperStatus will normally also > be down (or notPresent) i.e., there is not (necessarily) a > fault condition on the interface. Yes. > So basically, just call virtio_config_driver_disable on close, > and then config interrupt will not trigger. > Why is that not enough? So when close (admin down), we need to call netif_carrier_off() to make (oper status down). Thanks > > > > > > > > > > > > > > > > > > > > > --- > > > > drivers/net/virtio_net.c | 64 ++++++++++++++++++++++++---------------- > > > > 1 file changed, 38 insertions(+), 26 deletions(-) > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > index 0b4747e81464..e6626ba25b29 100644 > > > > --- a/drivers/net/virtio_net.c > > > > +++ b/drivers/net/virtio_net.c > > > > @@ -2476,6 +2476,25 @@ static void virtnet_cancel_dim(struct virtnet_info *vi, struct dim *dim) > > > > net_dim_work_cancel(dim); > > > > } > > > > > > > > +static void virtnet_update_settings(struct virtnet_info *vi) > > > > +{ > > > > + u32 speed; > > > > + u8 duplex; > > > > + > > > > + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX)) > > > > + return; > > > > + > > > > + virtio_cread_le(vi->vdev, struct virtio_net_config, speed, &speed); > > > > + > > > > + if (ethtool_validate_speed(speed)) > > > > + vi->speed = speed; > > > > + > > > > + virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, &duplex); > > > > + > > > > + if (ethtool_validate_duplex(duplex)) > > > > + vi->duplex = duplex; > > > > +} > > > > + > > > > static int virtnet_open(struct net_device *dev) > > > > { > > > > struct virtnet_info *vi = netdev_priv(dev); > > > > @@ -2494,6 +2513,18 @@ static int virtnet_open(struct net_device *dev) > > > > goto err_enable_qp; > > > > } > > > > > > > > + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) { > > > > + virtio_config_driver_enable(vi->vdev); > > > > + /* Do not schedule the config change work as the > > > > + * config change notification might have been disabled > > > > + * by the virtio core. */ > > > > > > I don't get why you need this. > > > If the notification was disabled it will just trigger later. > > > This is exactly why using core is a good idea. > > > > So we need a read here (this seems to be not rare for most modern > > hardware NICs) because we don't know if the link status is changed or > > not and it is not guaranteed by virtio_config_driver_enable() since it > > only works when there's a pending config change. Another thing is that > > the device is being freezed, so the virtio core may prevent the device > > from accessing the device. > > > > So using virtio_config_changed() will guaranteed that: > > > > 1) if the device is not being freezed, it will read the config space soon > > 2) if the device is being freezed, the read of the config space will > > be delayed to resume > > > > > > > > > > > > + virtio_config_changed(vi->vdev); > > > > + } else { > > > > + vi->status = VIRTIO_NET_S_LINK_UP; > > > > + virtnet_update_settings(vi); > > > > > > > > > And why do we need this here I don't get at all. > > > > See above, because doing this on a probe is racy and buggy: The > > opersate is up even if the adminstate is not, this is conflict with > > RFC2863: > > > > " > > If ifAdminStatus is down(2) then ifOperStatus > > should be down(2) > > " > > > > > > > > > + netif_carrier_on(dev); > > > > + } > > > > > > > > > > > > > + > > > > return 0; > > > > > > > > err_enable_qp: > > > > @@ -2936,12 +2967,19 @@ static int virtnet_close(struct net_device *dev) > > > > disable_delayed_refill(vi); > > > > /* Make sure refill_work doesn't re-enable napi! */ > > > > cancel_delayed_work_sync(&vi->refill); > > > > + /* Make sure config notification doesn't schedule config work */ > > > > + virtio_config_driver_disable(vi->vdev); > > > > + /* Make sure status updating is cancelled */ > > > > + cancel_work_sync(&vi->config_work); > > > > > > > > for (i = 0; i < vi->max_queue_pairs; i++) { > > > > virtnet_disable_queue_pair(vi, i); > > > > virtnet_cancel_dim(vi, &vi->rq[i].dim); > > > > } > > > > > > > > + vi->status &= ~VIRTIO_NET_S_LINK_UP; > > > > + netif_carrier_off(dev); > > > > + > > > > return 0; > > > > } > > > > > > > > @@ -4640,25 +4678,6 @@ static void virtnet_init_settings(struct net_device *dev) > > > > vi->duplex = DUPLEX_UNKNOWN; > > > > } > > > > > > > > -static void virtnet_update_settings(struct virtnet_info *vi) > > > > -{ > > > > - u32 speed; > > > > - u8 duplex; > > > > - > > > > - if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX)) > > > > - return; > > > > - > > > > - virtio_cread_le(vi->vdev, struct virtio_net_config, speed, &speed); > > > > - > > > > - if (ethtool_validate_speed(speed)) > > > > - vi->speed = speed; > > > > - > > > > - virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, &duplex); > > > > - > > > > - if (ethtool_validate_duplex(duplex)) > > > > - vi->duplex = duplex; > > > > -} > > > > - > > > > static u32 virtnet_get_rxfh_key_size(struct net_device *dev) > > > > { > > > > return ((struct virtnet_info *)netdev_priv(dev))->rss_key_size; > > > > @@ -6000,13 +6019,6 @@ static int virtnet_probe(struct virtio_device *vdev) > > > > /* Assume link up if device can't report link status, > > > > otherwise get link status from config. */ > > > > netif_carrier_off(dev); > > > > - if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) { > > > > - schedule_work(&vi->config_work); > > > > - } else { > > > > - vi->status = VIRTIO_NET_S_LINK_UP; > > > > - virtnet_update_settings(vi); > > > > - netif_carrier_on(dev); > > > > - } > > > > > > > > > Here it all made sense - we were reading config for the 1st time. > > > > See above. > > > > Thanks > > > > > > > > > > > > for (i = 0; i < ARRAY_SIZE(guest_offloads); i++) > > > > if (virtio_has_feature(vi->vdev, guest_offloads[i])) > > > > -- > > > > 2.31.1 > > > > > > >