On Mon, Mar 10, 2025 at 2:53 PM Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote: > > On 2025/03/10 12:55, Jason Wang wrote: > > On Fri, Mar 7, 2025 at 7:01 PM Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote: > >> > >> They are useful to implement VIRTIO_NET_F_RSS and > >> VIRTIO_NET_F_HASH_REPORT. > >> > >> Signed-off-by: Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> > >> Tested-by: Lei Yang <leiyang@xxxxxxxxxx> > >> --- > >> include/linux/virtio_net.h | 188 +++++++++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 188 insertions(+) > >> > >> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h > >> index 02a9f4dc594d02372a6c1850cd600eff9d000d8d..426f33b4b82440d61b2af9fdc4c0b0d4c571b2c5 100644 > >> --- a/include/linux/virtio_net.h > >> +++ b/include/linux/virtio_net.h > >> @@ -9,6 +9,194 @@ > >> #include <uapi/linux/tcp.h> > >> #include <uapi/linux/virtio_net.h> > >> > >> +struct virtio_net_hash { > >> + u32 value; > >> + u16 report; > >> +}; > >> + > >> +struct virtio_net_toeplitz_state { > >> + u32 hash; > >> + const u32 *key; > >> +}; > >> + > >> +#define VIRTIO_NET_SUPPORTED_HASH_TYPES (VIRTIO_NET_RSS_HASH_TYPE_IPv4 | \ > >> + VIRTIO_NET_RSS_HASH_TYPE_TCPv4 | \ > >> + VIRTIO_NET_RSS_HASH_TYPE_UDPv4 | \ > >> + VIRTIO_NET_RSS_HASH_TYPE_IPv6 | \ > >> + VIRTIO_NET_RSS_HASH_TYPE_TCPv6 | \ > >> + VIRTIO_NET_RSS_HASH_TYPE_UDPv6) > > > > Let's explain why > > > > #define VIRTIO_NET_HASH_REPORT_IPv6_EX 7 > > #define VIRTIO_NET_HASH_REPORT_TCPv6_EX 8 > > #define VIRTIO_NET_HASH_REPORT_UDPv6_EX 9 > > > > are missed here. > > Because they require parsing IPv6 options and I'm not sure how many we > need to parse. QEMU's eBPF program has a hard-coded limit of 30 options; > it has some explanation for this limit, but it does not seem definitive > either: > https://gitlab.com/qemu-project/qemu/-/commit/f3fa412de28ae3cb31d38811d30a77e4e20456cc#6ec48fc8af2f802e92f5127425e845c4c213ff60_0_165 > How about the usersapce datapath RSS in Qemu? (We probably don't need to align with eBPF RSS as it's just a reference implementation) > In this patch series, I add an ioctl to query capability instead; it > allows me leaving those hash types unimplemented and is crucial to > assure extensibility for future additions of hash types anyway. Anyone > who find these hash types useful can implement in the future. Yes, but we need to make sure no userspace visible behaviour changes after migration. > > > > > And explain how we could maintain migration compatibility > > > > 1) Does those three work for userspace datapath in Qemu? If yes, > > migration will be broken. > > They work for userspace datapath so my RFC patch series for QEMU uses > TUNGETVNETHASHCAP to prevent breaking migration: > https://patchew.org/QEMU/20240915-hash-v3-0-79cb08d28647@xxxxxxxxxx/ > Ok, let's mention this in the cover letter. Another interesting thing is the migration from 10.0 to 9.0. > This patch series first adds configuration options for users to choose > hash types. QEMU then automatically picks one implementation from the > following (the earlier one is the more preferred): > 1) The hash capability of vhost hardware > 2) The hash capability I'm proposing here > 3) The eBPF program > 4) The pure userspace implementation > > This decision depends on the following: > - The required hash types; supported ones are queried for 1) and 2) > - Whether vhost is enabled or not and what vhost backend is used > - Whether hash reporting is enabled; 3) is incompatible with this > > The network device will not be realized if no implementation satisfies > the requirements. This makes sense, let's add this in the cover letter. > > > 2) once we support those three in the future. For example, is the qemu > > expected to probe this via TUNGETVNETHASHCAP in the destination and > > fail the migration? > > QEMU is expected to use TUNGETVNETHASHCAP, but it can selectively enable > hash types with TUNSETVNETHASH to keep migration working. > > In summary, this patch series provides a sufficient facility for the > userspace to make extensibility and migration compatible; > TUNGETVNETHASHCAP exposes all of the kernel capabilities and > TUNSETVNETHASH allows the userspace to limit them. > > Regards, > Akihiko Odaki Fine. Thanks > > > > > Thanks > > > > > > > >> + > >> +#define VIRTIO_NET_RSS_MAX_KEY_SIZE 40 > >> + > >> +static inline void virtio_net_toeplitz_convert_key(u32 *input, size_t len) > >> +{ > >> + while (len >= sizeof(*input)) { > >> + *input = be32_to_cpu((__force __be32)*input); > >> + input++; > >> + len -= sizeof(*input); > >> + } > >> +} > >> + > >> +static inline void virtio_net_toeplitz_calc(struct virtio_net_toeplitz_state *state, > >> + const __be32 *input, size_t len) > >> +{ > >> + while (len >= sizeof(*input)) { > >> + for (u32 map = be32_to_cpu(*input); map; map &= (map - 1)) { > >> + u32 i = ffs(map); > >> + > >> + state->hash ^= state->key[0] << (32 - i) | > >> + (u32)((u64)state->key[1] >> i); > >> + } > >> + > >> + state->key++; > >> + input++; > >> + len -= sizeof(*input); > >> + } > >> +} > >> + > >> +static inline u8 virtio_net_hash_key_length(u32 types) > >> +{ > >> + size_t len = 0; > >> + > >> + if (types & VIRTIO_NET_HASH_REPORT_IPv4) > >> + len = max(len, > >> + sizeof(struct flow_dissector_key_ipv4_addrs)); > >> + > >> + if (types & > >> + (VIRTIO_NET_HASH_REPORT_TCPv4 | VIRTIO_NET_HASH_REPORT_UDPv4)) > >> + len = max(len, > >> + sizeof(struct flow_dissector_key_ipv4_addrs) + > >> + sizeof(struct flow_dissector_key_ports)); > >> + > >> + if (types & VIRTIO_NET_HASH_REPORT_IPv6) > >> + len = max(len, > >> + sizeof(struct flow_dissector_key_ipv6_addrs)); > >> + > >> + if (types & > >> + (VIRTIO_NET_HASH_REPORT_TCPv6 | VIRTIO_NET_HASH_REPORT_UDPv6)) > >> + len = max(len, > >> + sizeof(struct flow_dissector_key_ipv6_addrs) + > >> + sizeof(struct flow_dissector_key_ports)); > >> + > >> + return len + sizeof(u32); > >> +} > >> + > >> +static inline u32 virtio_net_hash_report(u32 types, > >> + const struct flow_keys_basic *keys) > >> +{ > >> + switch (keys->basic.n_proto) { > >> + case cpu_to_be16(ETH_P_IP): > >> + if (!(keys->control.flags & FLOW_DIS_IS_FRAGMENT)) { > >> + if (keys->basic.ip_proto == IPPROTO_TCP && > >> + (types & VIRTIO_NET_RSS_HASH_TYPE_TCPv4)) > >> + return VIRTIO_NET_HASH_REPORT_TCPv4; > >> + > >> + if (keys->basic.ip_proto == IPPROTO_UDP && > >> + (types & VIRTIO_NET_RSS_HASH_TYPE_UDPv4)) > >> + return VIRTIO_NET_HASH_REPORT_UDPv4; > >> + } > >> + > >> + if (types & VIRTIO_NET_RSS_HASH_TYPE_IPv4) > >> + return VIRTIO_NET_HASH_REPORT_IPv4; > >> + > >> + return VIRTIO_NET_HASH_REPORT_NONE; > >> + > >> + case cpu_to_be16(ETH_P_IPV6): > >> + if (!(keys->control.flags & FLOW_DIS_IS_FRAGMENT)) { > >> + if (keys->basic.ip_proto == IPPROTO_TCP && > >> + (types & VIRTIO_NET_RSS_HASH_TYPE_TCPv6)) > >> + return VIRTIO_NET_HASH_REPORT_TCPv6; > >> + > >> + if (keys->basic.ip_proto == IPPROTO_UDP && > >> + (types & VIRTIO_NET_RSS_HASH_TYPE_UDPv6)) > >> + return VIRTIO_NET_HASH_REPORT_UDPv6; > >> + } > >> + > >> + if (types & VIRTIO_NET_RSS_HASH_TYPE_IPv6) > >> + return VIRTIO_NET_HASH_REPORT_IPv6; > >> + > >> + return VIRTIO_NET_HASH_REPORT_NONE; > >> + > >> + default: > >> + return VIRTIO_NET_HASH_REPORT_NONE; > >> + } > >> +} > >> + > >> +static inline void virtio_net_hash_rss(const struct sk_buff *skb, > >> + u32 types, const u32 *key, > >> + struct virtio_net_hash *hash) > >> +{ > >> + struct virtio_net_toeplitz_state toeplitz_state = { .key = key }; > >> + struct flow_keys flow; > >> + struct flow_keys_basic flow_basic; > >> + u16 report; > >> + > >> + if (!skb_flow_dissect_flow_keys(skb, &flow, 0)) { > >> + hash->report = VIRTIO_NET_HASH_REPORT_NONE; > >> + return; > >> + } > >> + > >> + flow_basic = (struct flow_keys_basic) { > >> + .control = flow.control, > >> + .basic = flow.basic > >> + }; > >> + > >> + report = virtio_net_hash_report(types, &flow_basic); > >> + > >> + switch (report) { > >> + case VIRTIO_NET_HASH_REPORT_IPv4: > >> + virtio_net_toeplitz_calc(&toeplitz_state, > >> + (__be32 *)&flow.addrs.v4addrs, > >> + sizeof(flow.addrs.v4addrs)); > >> + break; > >> + > >> + case VIRTIO_NET_HASH_REPORT_TCPv4: > >> + virtio_net_toeplitz_calc(&toeplitz_state, > >> + (__be32 *)&flow.addrs.v4addrs, > >> + sizeof(flow.addrs.v4addrs)); > >> + virtio_net_toeplitz_calc(&toeplitz_state, &flow.ports.ports, > >> + sizeof(flow.ports.ports)); > >> + break; > >> + > >> + case VIRTIO_NET_HASH_REPORT_UDPv4: > >> + virtio_net_toeplitz_calc(&toeplitz_state, > >> + (__be32 *)&flow.addrs.v4addrs, > >> + sizeof(flow.addrs.v4addrs)); > >> + virtio_net_toeplitz_calc(&toeplitz_state, &flow.ports.ports, > >> + sizeof(flow.ports.ports)); > >> + break; > >> + > >> + case VIRTIO_NET_HASH_REPORT_IPv6: > >> + virtio_net_toeplitz_calc(&toeplitz_state, > >> + (__be32 *)&flow.addrs.v6addrs, > >> + sizeof(flow.addrs.v6addrs)); > >> + break; > >> + > >> + case VIRTIO_NET_HASH_REPORT_TCPv6: > >> + virtio_net_toeplitz_calc(&toeplitz_state, > >> + (__be32 *)&flow.addrs.v6addrs, > >> + sizeof(flow.addrs.v6addrs)); > >> + virtio_net_toeplitz_calc(&toeplitz_state, &flow.ports.ports, > >> + sizeof(flow.ports.ports)); > >> + break; > >> + > >> + case VIRTIO_NET_HASH_REPORT_UDPv6: > >> + virtio_net_toeplitz_calc(&toeplitz_state, > >> + (__be32 *)&flow.addrs.v6addrs, > >> + sizeof(flow.addrs.v6addrs)); > >> + virtio_net_toeplitz_calc(&toeplitz_state, &flow.ports.ports, > >> + sizeof(flow.ports.ports)); > >> + break; > >> + > >> + default: > >> + hash->report = VIRTIO_NET_HASH_REPORT_NONE; > >> + return; > >> + } > >> + > >> + hash->value = toeplitz_state.hash; > >> + hash->report = report; > >> +} > >> + > >> static inline bool virtio_net_hdr_match_proto(__be16 protocol, __u8 gso_type) > >> { > >> switch (gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { > >> > >> -- > >> 2.48.1 > >> > > >