Hello, Only few notes below. If all goes well, v4 can be the last version... :) On Wed, 21 Mar 2018, Inju Song wrote: > Implements the Google's Maglev hashing algorithm as a IPVS scheduler. > > Basically it provides consistent hashing but offers some special > features about disruption and load balancing. > > 1) minimal disruption: when the set of destinations changes, > a connection will likely be sent to the same destination > as it was before. > > 2) load balancing: each destination will receive an almost > equal number of connections. > > Seel also for detail: [3.4 Consistent Hasing] in > https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-eisenbud.pdf Above commit message is properly indented but comments in patches 1 and 3 should not be indented, there should not be spaces at left. > Signed-off-by: Inju Song <inju.song@xxxxxxxxxxxxx> > --- > net/netfilter/ipvs/ip_vs_mh.c | 532 ++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 532 insertions(+) > create mode 100644 net/netfilter/ipvs/ip_vs_mh.c > > diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c > new file mode 100644 > index 0000000..e531100 > --- /dev/null > +++ b/net/netfilter/ipvs/ip_vs_mh.c > @@ -0,0 +1,532 @@ > + IP_VS_DBG_BUF(6, > + "MH: selected unavailable server %s:%d (offset %d), reselecting", > + IP_VS_DBG_ADDR(dest->af, &dest->addr), > + ntohs(dest->port), roffset); As some vars are unsigned int, make sure all %d in debug messages are changed to %u, for consistency. This also includes all ports. I guess, these %d come from the SH scheduler. > +static void ip_vs_mh_state_free(struct rcu_head *head) > +{ > + struct ip_vs_mh_state *s; > + > + s = container_of(head, struct ip_vs_mh_state, rcu_head); > + kfree(s->lookup); > + kfree(s); > +} > + > +static int ip_vs_mh_init_svc(struct ip_vs_service *svc) > +{ > + struct ip_vs_mh_state *s; > + > + /* allocate the MH table for this service */ > + s = kzalloc(sizeof(*s), GFP_KERNEL); > + if (!s) > + return -ENOMEM; > + > + s->lookup = kcalloc(IP_VS_MH_TAB_SIZE, sizeof(struct ip_vs_mh_lookup), > + GFP_KERNEL); > + if (!s->lookup) { > + kfree(s); > + return -ENOMEM; > + } > + > + generate_hash_secret(&s->hash1, &s->hash2); > + s->gcd = ip_vs_mh_gcd_weight(svc); > + s->rshift = ip_vs_mh_shift_weight(svc, s->gcd); > + > + svc->sched_data = s; > + IP_VS_DBG(6, > + "MH lookup table (memory=%zdbytes) allocated for current service\n", > + sizeof(struct ip_vs_mh_lookup) * IP_VS_MH_TAB_SIZE); > + > + /* permutate & populate with current dests */ > + return ip_vs_mh_reassign(s, svc); Looks like we must not set svc->sched_data on error. We can do it like this: int ret; ... IP_VS_DBG() ret = ip_vs_mh_reassign(s, svc); if (ret < 0) { ip_vs_mh_reset(s); /* This is here for completeness */ ip_vs_mh_state_free(&s->rcu_head); return ret; } /* No more failures, attach state */ svc->sched_data = s; return 0; It should be ok because we use svc->sched_data only from entry points, the scheduler does not use it internally. > +} > + > +static void ip_vs_mh_done_svc(struct ip_vs_service *svc) > +{ > + struct ip_vs_mh_state *s = svc->sched_data; > + > + /* got to clean up hash buckets here */ > + ip_vs_mh_reset(s); > + > + /* release the table itself */ > + ip_vs_mh_state_free(&s->rcu_head); Above should be: call_rcu(&s->rcu_head, ip_vs_mh_state_free); As result, the RCU callback will be called after RCU grace period giving chance to RCU readers to stop using the svc->sched_data state. > + IP_VS_DBG(6, "MH lookup table (memory=%zdbytes) released\n", > + sizeof(struct ip_vs_mh_lookup) * IP_VS_MH_TAB_SIZE); Regards -- Julian Anastasov <ja@xxxxxx>