On Tue, Jan 12, 2021 at 11:11 PM Jason Wang <jasowang@xxxxxxxxxx> wrote: > > > On 2021/1/13 上午7:47, Willem de Bruijn wrote: > > On Tue, Jan 12, 2021 at 3:29 PM Yuri Benditovich > > <yuri.benditovich@xxxxxxxxxx> wrote: > >> On Tue, Jan 12, 2021 at 9:49 PM Yuri Benditovich > >> <yuri.benditovich@xxxxxxxxxx> wrote: > >>> On Tue, Jan 12, 2021 at 9:41 PM Yuri Benditovich > >>> <yuri.benditovich@xxxxxxxxxx> wrote: > >>>> Existing TUN module is able to use provided "steering eBPF" to > >>>> calculate per-packet hash and derive the destination queue to > >>>> place the packet to. The eBPF uses mapped configuration data > >>>> containing a key for hash calculation and indirection table > >>>> with array of queues' indices. > >>>> > >>>> This series of patches adds support for virtio-net hash reporting > >>>> feature as defined in virtio specification. It extends the TUN module > >>>> and the "steering eBPF" as follows: > >>>> > >>>> Extended steering eBPF calculates the hash value and hash type, keeps > >>>> hash value in the skb->hash and returns index of destination virtqueue > >>>> and the type of the hash. TUN module keeps returned hash type in > >>>> (currently unused) field of the skb. > >>>> skb->__unused renamed to 'hash_report_type'. > >>>> > >>>> When TUN module is called later to allocate and fill the virtio-net > >>>> header and push it to destination virtqueue it populates the hash > >>>> and the hash type into virtio-net header. > >>>> > >>>> VHOST driver is made aware of respective virtio-net feature that > >>>> extends the virtio-net header to report the hash value and hash report > >>>> type. > >>> Comment from Willem de Bruijn: > >>> > >>> Skbuff fields are in short supply. I don't think we need to add one > >>> just for this narrow path entirely internal to the tun device. > >>> > >> We understand that and try to minimize the impact by using an already > >> existing unused field of skb. > > Not anymore. It was repurposed as a flags field very recently. > > > > This use case is also very narrow in scope. And a very short path from > > data producer to consumer. So I don't think it needs to claim scarce > > bits in the skb. > > > > tun_ebpf_select_queue stores the field, tun_put_user reads it and > > converts it to the virtio_net_hdr in the descriptor. > > > > tun_ebpf_select_queue is called from .ndo_select_queue. Storing the > > field in skb->cb is fragile, as in theory some code could overwrite > > that between field between ndo_select_queue and > > ndo_start_xmit/tun_net_xmit, from which point it is fully under tun > > control again. But in practice, I don't believe anything does. > > > > Alternatively an existing skb field that is used only on disjoint > > datapaths, such as ingress-only, could be viable. > > > A question here. We had metadata support in XDP for cooperation between > eBPF programs. Do we have something similar in the skb? > > E.g in the RSS, if we want to pass some metadata information between > eBPF program and the logic that generates the vnet header (either hard > logic in the kernel or another eBPF program). Is there any way that can > avoid the possible conflicts of qdiscs? Not that I am aware of. The closest thing is cb[]. It'll have to aliase a field like that, that is known unused for the given path. One other approach that has been used within linear call stacks is out of band. Like percpu variables softnet_data.xmit.more and mirred_rec_level. But that is perhaps a bit overwrought for this use case. > > > >>> Instead, you could just run the flow_dissector in tun_put_user if the > >>> feature is negotiated. Indeed, the flow dissector seems more apt to me > >>> than BPF here. Note that the flow dissector internally can be > >>> overridden by a BPF program if the admin so chooses. > >>> > >> When this set of patches is related to hash delivery in the virtio-net > >> packet in general, > >> it was prepared in context of RSS feature implementation as defined in > >> virtio spec [1] > >> In case of RSS it is not enough to run the flow_dissector in tun_put_user: > >> in tun_ebpf_select_queue the TUN calls eBPF to calculate the hash, > >> hash type and queue index > >> according to the (mapped) parameters (key, hash types, indirection > >> table) received from the guest. > > TUNSETSTEERINGEBPF was added to support more diverse queue selection > > than the default in case of multiqueue tun. Not sure what the exact > > use cases are. > > > > But RSS is exactly the purpose of the flow dissector. It is used for > > that purpose in the software variant RPS. The flow dissector > > implements a superset of the RSS spec, and certainly computes a > > four-tuple for TCP/IPv6. In the case of RPS, it is skipped if the NIC > > has already computed a 4-tuple hash. > > > > What it does not give is a type indication, such as > > VIRTIO_NET_HASH_TYPE_TCPv6. I don't understand how this would be used. > > In datapaths where the NIC has already computed the four-tuple hash > > and stored it in skb->hash --the common case for servers--, That type > > field is the only reason to have to compute again. > > > The problem is there's no guarantee that the packet comes from the NIC, > it could be a simple VM2VM or host2VM packet. > > And even if the packet is coming from the NIC that calculates the hash > there's no guarantee that it's the has that guest want (guest may use > different RSS keys). Ah yes, of course. I would still revisit the need to store a detailed hash_type along with the hash, as as far I can tell that conveys no actionable information to the guest.