Felix Fietkau <nbd@xxxxxxxx> writes: > On 2020-12-18 13:41, Toke Høiland-Jørgensen wrote: >> Felix Fietkau <nbd@xxxxxxxx> writes: >> >>> On 2020-12-17 18:26, Toke Høiland-Jørgensen wrote: >>>> Felix Fietkau <nbd@xxxxxxxx> writes: >>>>> If this becomes a problem, I think we should add a similar patch to >>>>> wireguard, which already calls skb_get_hash before encapsulating. >>>>> Other regular tunnels should already get a proper hash, since the flow >>>>> dissector will take care of it. >>>> >>>> But then we'd need to go around adding this to all the places that uses >>>> the hash just to work around a particular piece of broken(ish) hardware. >>>> And we're hard-coding a behaviour in mac80211 that means we'll *always* >>>> recompute the hash, even for hardware that's not similarly broken. >>>> >>>>> The reason I did this patch is because I have a patch to set the hw flow >>>>> hash in the skb on mtk_eth_soc, which does help GRO, but leads to >>>>> collisions on mac80211 fq. >>>> >>>> So wouldn't the right thing to do here be to put a flag into the RX >>>> device that makes the stack clear the hash after using it for GRO? >>> I don't think the hardware is broken, I think fq is simply making >>> assumptions about the hash that aren't met by the hw. >>> >>> The documentation in include/linux/skbuff.h mentions these requirements >>> for the skb hash: >>> * 1) Two packets in different flows have different hash values >>> * 2) Two packets in the same flow should have the same hash value >>> >>> FWIW, I think the 'should' from 2) probably belongs to 1), otherwise it >>> makes no sense. Two packets of the flow must return the same hash, >>> otherwise the hash is broken. I'm assuming this is a typo. >> >> There's some text further down indicating this is deliberate: >> >> * A driver may indicate a hash level which is less specific than the >> * actual layer the hash was computed on. For instance, a hash computed >> * at L4 may be considered an L3 hash. This should only be done if the >> * driver can't unambiguously determine that the HW computed the hash at >> * the higher layer. Note that the "should" in the second property above >> * permits this. >> >> So the way I'm reading that whole section, either the intent is that >> both properties should be fulfilled, or that the first one (being >> collision-free) is more important... > A hash - by definition - cannot be collision free. > But that's beside the point. On my hw, the hash itself seems collision > free for the flows that I'm pushing, but the result of the > reciprocal_scale isn't. > I took another look and figured out the reason for that: > The hw delivers a 14 bit hash. reciprocal_scale assumes that the values > are distributed across the full 32 bit range. So in this case, the lower > bits are pretty much ignored and the result of the reciprocal_scale is 0 > or close to 0, which is what's causing the collisions in fq. Ah, right, that makes sense! > Maybe the assumption that the hash should be distributed across the full > 32 bit range should be documented somewhere :) Yeah, I agree. Maybe just updating that comment in skbuff.h? Do you want to fold such an update into your series? Otherwise I can send a patch once net-next opens... >>> In addition to those properties, fq needs the hash to be >>> cryptographically secure, so that it can use reciprocal_scale to sort >>> flows into buckets without allowing an attacker to craft collisions. >>> That's also the reason why it used to use skb_get_hash_perturb with a >>> random perturbation until we got software hashes based on siphash. >>> >>> I think it's safe to assume that most hardware out there will not >>> provide collision resistant hashes, so in my opinion fq cannot rely on a >>> hardware hash. We don't need to go around and change all places that use >>> the hash, just those that assume a collision resistant one. >> >> I did a quick grep-based survey of uses of skb_get_hash() outside >> drivers - this is what I found (with my interpretations of what they're >> used for): >> >> net/core/dev.c : skb_tx_hash() - selecting TX queue w/reciprocal scale >> net/core/dev.c : RX flow steering, flow limiting >> net/core/dev.c : GRO >> net/core/filter.c : BPF helper >> include/net/ip_tunnels.h : flowi4_multipath_hash - so multipath selection? >> net/ipv{4,6}/route.c : multipath hashing (if l4) >> net/ipv6/seg6_iptunnel : building flow labels >> net/mac80211/tx.c : FQ >> net/mptcp/syncookies : storing cookies (XOR w/net_hash_mix()) >> net/netfilter/nft_hash.c : symhash input (seems to be load balancing) >> net/openvswitch : flow hashing and actions >> net/packet/af_packet.c : PACKET_FANOUT_HASH >> net/sched/sch_*.c : flow hashing for queueing >> >> Apart from GRO it's not obvious to me that a trivially >> attacker-controlled hash is safe in any of those uses? > I looked at some of those uses you mentioned here. > Most of them fit into 2 categories: > 1. Sort into power-of-2 buckets and use hash & (size-1), effectively > using the lower bits only. > 2. Use reciprocal_scale - effectively using the higher bits only. > For the hash that my hw is reporting, type 1 is working and type 2 is > broken. > > So it seems to me that the solution would involve running a simple hash > on the 14 bit values to get the bits distributed to the full 32 bit > range without adding too much bias. > I will do this in the driver and drop this patch. Yes, this seems like a reasonable solution; great! > Thanks for looking into this, You're welcome :) -Toke