> -----Original Message----- > From: Tom Herbert [mailto:tom@xxxxxxxxxxxxxxx] > Sent: Thursday, January 14, 2016 2:41 PM > To: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> > Cc: Eric Dumazet <eric.dumazet@xxxxxxxxx>; One Thousand Gnomes > <gnomes@xxxxxxxxxxxxxxxxxxx>; David Miller <davem@xxxxxxxxxxxxx>; > vkuznets@xxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; KY Srinivasan > <kys@xxxxxxxxxxxxx>; devel@xxxxxxxxxxxxxxxxxxxxxx; linux- > kernel@xxxxxxxxxxxxxxx > Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on > struct flow_keys layout > > On Thu, Jan 14, 2016 at 11:15 AM, Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> > wrote: > > > > > >> -----Original Message----- > >> From: Tom Herbert [mailto:tom@xxxxxxxxxxxxxxx] > >> Sent: Thursday, January 14, 2016 1:49 PM > >> To: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> > >> Cc: Eric Dumazet <eric.dumazet@xxxxxxxxx>; One Thousand Gnomes > >> <gnomes@xxxxxxxxxxxxxxxxxxx>; David Miller <davem@xxxxxxxxxxxxx>; > >> vkuznets@xxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; KY Srinivasan > >> <kys@xxxxxxxxxxxxx>; devel@xxxxxxxxxxxxxxxxxxxxxx; linux- > >> kernel@xxxxxxxxxxxxxxx > >> Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on > >> struct flow_keys layout > >> > >> On Thu, Jan 14, 2016 at 10:35 AM, Haiyang Zhang > <haiyangz@xxxxxxxxxxxxx> > >> wrote: > >> > > >> > > >> >> -----Original Message----- > >> >> From: Eric Dumazet [mailto:eric.dumazet@xxxxxxxxx] > >> >> Sent: Thursday, January 14, 2016 1:24 PM > >> >> To: One Thousand Gnomes <gnomes@xxxxxxxxxxxxxxxxxxx> > >> >> Cc: Tom Herbert <tom@xxxxxxxxxxxxxxx>; Haiyang Zhang > >> >> <haiyangz@xxxxxxxxxxxxx>; David Miller <davem@xxxxxxxxxxxxx>; > >> >> vkuznets@xxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; KY Srinivasan > >> >> <kys@xxxxxxxxxxxxx>; devel@xxxxxxxxxxxxxxxxxxxxxx; linux- > >> >> kernel@xxxxxxxxxxxxxxx > >> >> Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on > >> >> struct flow_keys layout > >> >> > >> >> On Thu, 2016-01-14 at 17:53 +0000, One Thousand Gnomes wrote: > >> >> > > These results for Toeplitz are not plausible. Given random > input > >> you > >> >> > > cannot expect any hash function to produce such uniform > results. > >> I > >> >> > > suspect either your input data is biased or how your applying > the > >> >> hash > >> >> > > is. > >> >> > > > >> >> > > When I run 64 random IPv4 3-tuples through Toeplitz and > Jenkins I > >> >> get > >> >> > > something more reasonable: > >> >> > > >> >> > IPv4 address patterns are not random. Nothing like it. A long > long > >> >> time > >> >> > ago we did do a bunch of tuning for network hashes using big > porn > >> site > >> >> > data sets. Random it was not. > >> >> > > >> >> > >> >> I ran my tests with non random IPV4 addresses, as I had 2 hosts, > >> >> one server, one client. (typical benchmark stuff) > >> >> > >> >> The only 'random' part was the ports, so maybe ~20 bits of entropy, > >> >> considering how we allocate ports during connect() to a given > >> >> destination to avoid port reuse. > >> >> > >> >> > It's probably hard to repeat that exercise now with geo specific > >> >> routing, > >> >> > and all the front end caches and redirectors on big sites but > I'd > >> >> > strongly suggest random input is not a good test, and also that > you > >> >> need > >> >> > to worry more about hash attacks than perfect distributions. > >> >> > >> >> Anyway, the exercise is not to find a hash that exactly splits 128 > >> flows > >> >> into 16 buckets, according to the number of flows per bucket. > >> >> > >> >> Maybe only 4 flows are sending at 3Gbits, and others are sending > at > >> 100 > >> >> kbits. There is no way the driver can predict the future. > >> >> > >> >> This is why we prefer to select a queue given the cpu sending the > >> >> packet. This permits a natural shift based on actual load, and is > the > >> >> default on linux (see XPS in Documentation/networking/scaling.txt) > >> >> > >> >> Only this driver has a selection based on a flow 'hash'. > >> > > >> > Also, the port number selection may not be random either. For > example, > >> > the well-known network throughput test tool, iperf, use port > numbers > >> with > >> > equal increment among them. We tested these non-random cases, and > >> found > >> > the Toeplitz hash has distributed evenly, but Jenkins hash has non- > >> even > >> > distribution. > >> > > >> > I'm aware of the test from Tom Herbert <tom@xxxxxxxxxxxxxxx>, which > >> > showing similar results of Toeplitz v.s. Jenkins with random inputs. > >> > > >> > In summary, the Toeplitz performs better in case of non-random > inputs, > >> > and performs similar to Jenkins in random inputs (which may not be > the > >> > case in real world). So we still prefer to use Toeplitz hash. > >> > > >> You are basing your conclusions on one toy benchmark. I don't believe > >> that an realistically loaded web server is going to consistently give > >> you tuples that happen to somehow fit into a nice model so that the > >> bias benefits your load distribution. > >> > >> > To minimize the computational overhead, we may consider put the > hash > >> > in a per-connection cache in TCP layer, so it only needs one time > >> > computation. But, even with the computation overhead at this moment, > >> > the throughput based on Toeplitz hash is better than Jenkins: > >> > Throughput (Gbps) comparison: > >> > #conn Toeplitz Jenkins > >> > 32 26.6 23.2 > >> > 64 32.1 23.4 > >> > 128 29.1 24.1 > >> > > >> You don't need to do that. We already store a random hash value in > the > >> connection context. If you want to make it non-random then just > >> replace that with a simple global counter. This will have the exact > >> same effect that you see in your tests without needing any expensive > >> computation. > > > > Could you point me to the data field of connection context where this > > hash value is stored? Is it computed only one time? > > > sk_txhash in struct sock. It is set to a random number on TCP or UDP > connect call, It can be reset to a different random value when > connection is seen to be have trouble (sk_rethink_txhash). > > Also when you say "Toeplitz performs better in case of non-random > inputs" please quantify exactly how your input data is not random. > What header changes with each connection in your test... Thank you for the info! For non-random inputs, I used the port selection of iperf that increases the port number by 2 for each connection. Only send-port numbers are different, other values are the same. I also tested some other fixed increment, Toeplitz spreads the connections evenly. For real applications, if the load came from local area, then the IP/port combinations are likely to have some non-random patterns. For our driver, we are thinking to put the Toeplitz hash to the sk_txhash, so it only needs to be computed only once, or during sk_rethink_txhash. So, the computational overhead happens almost only once. Thanks, - Haiyang _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel