On Tue, Aug 16, 2022 at 12:38:52PM -0400, Michael S. Tsirkin wrote: > On Mon, Aug 15, 2022 at 10:56:06AM -0700, Bobby Eshleman wrote: > > In order to support usage of qdisc on vsock traffic, this commit > > introduces a struct net_device to vhost and virtio vsock. > > > > Two new devices are created, vhost-vsock for vhost and virtio-vsock > > for virtio. The devices are attached to the respective transports. > > > > To bypass the usage of the device, the user may "down" the associated > > network interface using common tools. For example, "ip link set dev > > virtio-vsock down" lets vsock bypass the net_device and qdisc entirely, > > simply using the FIFO logic of the prior implementation. > > Ugh. That's quite a hack. Mark my words, at some point we will decide to > have down mean "down". Besides, multiple net devices with link up tend > to confuse userspace. So might want to keep it down at all times > even short term. > I have to admit, this choice was born more of perceived necessity than of a love for the design... but I can explain the pain points that led to the current state, which I hope sparks some discussion. When the state is down, dev_queue_xmit() will fail. To avoid this and preserve the "zero-configuration" guarantee of vsock, I chose to make transmission work regardless of device state by implementing this "ignore up/down state" hack. This is unfortunate because what we are really after here is just packet scheduling, i.e., qdisc. We don't really need the rest of the net_device, and I don't think up/down buys us anything of value. The relationship between qdisc and net_device is so tightly knit together though, that using qdisc without a net_device doesn't look very practical (and maybe impossible). Some alternative routes might be: 1) Default the state to up, and let users disable vsock by downing the device if they'd like. It still works out-of-the-box, but if users really want to disable vsock they may. 2) vsock may simply turn the device to state up when a socket is first used. For instance, the HCI device in net/bluetooth/hci_* uses a technique where the net_device is turned to up when bind() is called on any HCI socket (BTPROTO_HCI). It can also be turned up/down via ioctl(). 3) Modify net_device registration to allow us to have an invisible device that is only known by the kernel. It may default to up and remain unchanged. The qdisc controls alone may be exposed to userspace, hopefully via netlink to still work with tc. This is not currently supported by register_netdevice(), but a series from 2007 was sent to the ML, tentatively approved in concept, but was never merged[1]. 4) Currently NETDEV_UP/NETDEV_DOWN commands can't be vetoed. NETDEV_PRE_UP, however, is used to effectively veto NETDEV_UP commands[2]. We could introduce NETDEV_PRE_DOWN to support vetoing of NETDEV_DOWN. This would allow us to install a hook to determine if we actually want to allow the device to be downed. In an ideal world, we could simply pass a set of netdev queues, a packet, and maybe a blob of state to qdisc and let it work its scheduling magic... Any thoughts? [1]: https://lore.kernel.org/netdev/20070129140958.0cf6880f@freekitty/ [2]: https://lore.kernel.org/all/20090529.220906.243061042.davem@xxxxxxxxxxxxx/ Thanks, Bobby