On Tue, Mar 11, 2025 at 1:49 PM Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote: > > On 2025/03/11 9:47, Jason Wang wrote: > > On Mon, Mar 10, 2025 at 2:53 PM Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote: > >> > >> On 2025/03/10 12:55, Jason Wang wrote: > >>> On Fri, Mar 7, 2025 at 7:01 PM Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote: > >>>> > >>>> They are useful to implement VIRTIO_NET_F_RSS and > >>>> VIRTIO_NET_F_HASH_REPORT. > >>>> > >>>> Signed-off-by: Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> > >>>> Tested-by: Lei Yang <leiyang@xxxxxxxxxx> > >>>> --- > >>>> include/linux/virtio_net.h | 188 +++++++++++++++++++++++++++++++++++++++++++++ > >>>> 1 file changed, 188 insertions(+) > >>>> > >>>> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h > >>>> index 02a9f4dc594d02372a6c1850cd600eff9d000d8d..426f33b4b82440d61b2af9fdc4c0b0d4c571b2c5 100644 > >>>> --- a/include/linux/virtio_net.h > >>>> +++ b/include/linux/virtio_net.h > >>>> @@ -9,6 +9,194 @@ > >>>> #include <uapi/linux/tcp.h> > >>>> #include <uapi/linux/virtio_net.h> > >>>> > >>>> +struct virtio_net_hash { > >>>> + u32 value; > >>>> + u16 report; > >>>> +}; > >>>> + > >>>> +struct virtio_net_toeplitz_state { > >>>> + u32 hash; > >>>> + const u32 *key; > >>>> +}; > >>>> + > >>>> +#define VIRTIO_NET_SUPPORTED_HASH_TYPES (VIRTIO_NET_RSS_HASH_TYPE_IPv4 | \ > >>>> + VIRTIO_NET_RSS_HASH_TYPE_TCPv4 | \ > >>>> + VIRTIO_NET_RSS_HASH_TYPE_UDPv4 | \ > >>>> + VIRTIO_NET_RSS_HASH_TYPE_IPv6 | \ > >>>> + VIRTIO_NET_RSS_HASH_TYPE_TCPv6 | \ > >>>> + VIRTIO_NET_RSS_HASH_TYPE_UDPv6) > >>> > >>> Let's explain why > >>> > >>> #define VIRTIO_NET_HASH_REPORT_IPv6_EX 7 > >>> #define VIRTIO_NET_HASH_REPORT_TCPv6_EX 8 > >>> #define VIRTIO_NET_HASH_REPORT_UDPv6_EX 9 > >>> > >>> are missed here. > >> > >> Because they require parsing IPv6 options and I'm not sure how many we > >> need to parse. QEMU's eBPF program has a hard-coded limit of 30 options; > >> it has some explanation for this limit, but it does not seem definitive > >> either: > >> https://gitlab.com/qemu-project/qemu/-/commit/f3fa412de28ae3cb31d38811d30a77e4e20456cc#6ec48fc8af2f802e92f5127425e845c4c213ff60_0_165 > >> > > > > How about the usersapce datapath RSS in Qemu? (We probably don't need > > to align with eBPF RSS as it's just a reference implementation) > > The userspace datapath RSS has no limit. > > The reference implementation is the userspace datapath. The eBPF program > is intended to bring real performance benefit to Windows guests in > contrary. > > The userspace implementation does its best to provide defined RSS > capabilities but may not be performant. Parsing all IPv6 options have a > performance implication, but it is fine because it is not intended to be > performant in the first place. > > The performance problem is inherent to the userspace implementation, > which adds an extra overhead to the datapath. The eBPF program on the > other hand does not incur such overhead because it replaces the existing > steering algorithm (automq) instead of adding another layer. Hence the > eBPF program can be practical. > > That said, it is not that important to align with the userspace and eBPF > RSS in QEMU because they are still experimental anyway; the eBPF RSS has > potential to become a practical implementation but it is still in > development. The libvirt integration for the eBPF RSS is still not > complete, and we occasionally add fixes for RSS and hash reporting > without backporting to the stable branch. > > I'm adding interfaces to negotiate hash types rather for the future > extensibility. The specification may gain more hash types in the future > and other vhost backends may have a different set of hash types > supported. Figuring out how to deal with different sets of supported > hash typs is essential for both the kernel and QEMU. > > > > >> In this patch series, I add an ioctl to query capability instead; it > >> allows me leaving those hash types unimplemented and is crucial to > >> assure extensibility for future additions of hash types anyway. Anyone > >> who find these hash types useful can implement in the future. > > > > Yes, but we need to make sure no userspace visible behaviour changes > > after migration. > > Indeed, the goal is to make extensibility and migration compatible. So I see this part: + uint32_t supported_hash_types = n->rss_data.supported_hash_types; + uint32_t peer_hash_types = n->rss_data.peer_hash_types; + bool use_own_hash = + (supported_hash_types & VIRTIO_NET_RSS_SUPPORTED_HASHES) == + supported_hash_types; + bool use_peer_hash = + n->rss_data.peer_hash_available && + (supported_hash_types & peer_hash_types) == supported_hash_types; It looks like it would be a challenge to support vhost-user in the future if vhost-user supports hash feature others than source? > > > > >> > >>> > >>> And explain how we could maintain migration compatibility > >>> > >>> 1) Does those three work for userspace datapath in Qemu? If yes, > >>> migration will be broken. > >> > >> They work for userspace datapath so my RFC patch series for QEMU uses > >> TUNGETVNETHASHCAP to prevent breaking migration: > >> https://patchew.org/QEMU/20240915-hash-v3-0-79cb08d28647@xxxxxxxxxx/ > >> > > > > Ok, let's mention this in the cover letter. Another interesting thing > > is the migration from 10.0 to 9.0. > > The patch series is already mentioned in the cover letter. A description > of the intended use case of TUNGETVNETHASHCAP will be a good addition. > I'll add it to this patch so that it will be kept in tree after it gets > merged. > > Migration between two different QEMU versions should be handled with > versioned machine types. > > When a machine created in 9.0 is being migrated to 10.0, the machine > must set the hash type properties to match with the hash types supported > by the existing implementations, which means it sets the property for > VIRTIO_NET_HASH_REPORT_IPv6_EX to true, for example. Because this hash > type is currently not included in TUNGETVNETHASHCAP, the machine will > keep using the implementation used previously. The machine can be also > migrated back to 9.0 again. > > A machine type with version 10.0 cannot be migrated to 9.0 by design so > there is no new problem. I meant migrate qemu 11.0 with machine type 10.0 to qemu 10.0 with machine 10.0 etc. > > > > >> This patch series first adds configuration options for users to choose > >> hash types. QEMU then automatically picks one implementation from the > >> following (the earlier one is the more preferred): > >> 1) The hash capability of vhost hardware > >> 2) The hash capability I'm proposing here > >> 3) The eBPF program > >> 4) The pure userspace implementation > >> > >> This decision depends on the following: > >> - The required hash types; supported ones are queried for 1) and 2) > >> - Whether vhost is enabled or not and what vhost backend is used > >> - Whether hash reporting is enabled; 3) is incompatible with this > >> > >> The network device will not be realized if no implementation satisfies > >> the requirements. > > > > This makes sense, let's add this in the cover letter. > > I'll add it to the QEMU patch as it's more about details of QEMU. > The message of this patch will explain how TUNGETVNETHASHCAP and > TUNSETVNETHASH makes extensibility and migrattion compatible in general. > > Regards, > Akihiko Odaki > > > > >> > >>> 2) once we support those three in the future. For example, is the qemu > >>> expected to probe this via TUNGETVNETHASHCAP in the destination and > >>> fail the migration? > >> > >> QEMU is expected to use TUNGETVNETHASHCAP, but it can selectively enable > >> hash types with TUNSETVNETHASH to keep migration working. > >> > >> In summary, this patch series provides a sufficient facility for the > >> userspace to make extensibility and migration compatible; > >> TUNGETVNETHASHCAP exposes all of the kernel capabilities and > >> TUNSETVNETHASH allows the userspace to limit them. > >> > >> Regards, > >> Akihiko Odaki > > > > Fine. > > > > Thanks > > Thanks