Hello, Thank you for your comments. On Sat, Mar 24, 2018 at 01:26:04PM +0200, Julian Anastasov wrote: > > Hello, > > Only few notes below. If all goes well, v4 can be > the last version... :) > > On Wed, 21 Mar 2018, Inju Song wrote: > > > Implements the Google's Maglev hashing algorithm as a IPVS scheduler. > > > > Basically it provides consistent hashing but offers some special > > features about disruption and load balancing. > > > > 1) minimal disruption: when the set of destinations changes, > > a connection will likely be sent to the same destination > > as it was before. > > > > 2) load balancing: each destination will receive an almost > > equal number of connections. > > > > Seel also for detail: [3.4 Consistent Hasing] in > > https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-eisenbud.pdf > > Above commit message is properly indented but comments in > patches 1 and 3 should not be indented, there should not be > spaces at left. > ok. I will remove indentions in patch 1 and 3. > > Signed-off-by: Inju Song <inju.song@xxxxxxxxxxxxx> > > --- > > net/netfilter/ipvs/ip_vs_mh.c | 532 ++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 532 insertions(+) > > create mode 100644 net/netfilter/ipvs/ip_vs_mh.c > > > > diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c > > new file mode 100644 > > index 0000000..e531100 > > --- /dev/null > > +++ b/net/netfilter/ipvs/ip_vs_mh.c > > @@ -0,0 +1,532 @@ > > > + IP_VS_DBG_BUF(6, > > + "MH: selected unavailable server %s:%d (offset %d), reselecting", > > + IP_VS_DBG_ADDR(dest->af, &dest->addr), > > + ntohs(dest->port), roffset); > > As some vars are unsigned int, make sure all %d in > debug messages are changed to %u, for consistency. This > also includes all ports. I guess, these %d come from the > SH scheduler. > Yes, these printing format was from SH scheduler.. I will fix these properly. > > +static void ip_vs_mh_state_free(struct rcu_head *head) > > +{ > > + struct ip_vs_mh_state *s; > > + > > + s = container_of(head, struct ip_vs_mh_state, rcu_head); > > + kfree(s->lookup); > > + kfree(s); > > +} > > + > > +static int ip_vs_mh_init_svc(struct ip_vs_service *svc) > > +{ > > + struct ip_vs_mh_state *s; > > + > > + /* allocate the MH table for this service */ > > + s = kzalloc(sizeof(*s), GFP_KERNEL); > > + if (!s) > > + return -ENOMEM; > > + > > + s->lookup = kcalloc(IP_VS_MH_TAB_SIZE, sizeof(struct ip_vs_mh_lookup), > > + GFP_KERNEL); > > + if (!s->lookup) { > > + kfree(s); > > + return -ENOMEM; > > + } > > + > > + generate_hash_secret(&s->hash1, &s->hash2); > > + s->gcd = ip_vs_mh_gcd_weight(svc); > > + s->rshift = ip_vs_mh_shift_weight(svc, s->gcd); > > + > > + svc->sched_data = s; > > + IP_VS_DBG(6, > > + "MH lookup table (memory=%zdbytes) allocated for current service\n", > > + sizeof(struct ip_vs_mh_lookup) * IP_VS_MH_TAB_SIZE); > > + > > + /* permutate & populate with current dests */ > > + return ip_vs_mh_reassign(s, svc); > > Looks like we must not set svc->sched_data on error. > We can do it like this: > > int ret; > > ... > IP_VS_DBG() > ret = ip_vs_mh_reassign(s, svc); > if (ret < 0) { > ip_vs_mh_reset(s); /* This is here for completeness */ > ip_vs_mh_state_free(&s->rcu_head); > return ret; > } > /* No more failures, attach state */ > svc->sched_data = s; > return 0; > > It should be ok because we use svc->sched_data only > from entry points, the scheduler does not use it internally. > OK. Above case will be handled. > > +} > > + > > +static void ip_vs_mh_done_svc(struct ip_vs_service *svc) > > +{ > > + struct ip_vs_mh_state *s = svc->sched_data; > > + > > + /* got to clean up hash buckets here */ > > + ip_vs_mh_reset(s); > > + > > + /* release the table itself */ > > + ip_vs_mh_state_free(&s->rcu_head); > > Above should be: > > call_rcu(&s->rcu_head, ip_vs_mh_state_free); > > As result, the RCU callback will be called after > RCU grace period giving chance to RCU readers to stop > using the svc->sched_data state. > Ok, I missed adding to schedule rcu callback for freeing the state. It will be fixed. Thanks. > > + IP_VS_DBG(6, "MH lookup table (memory=%zdbytes) released\n", > > + sizeof(struct ip_vs_mh_lookup) * IP_VS_MH_TAB_SIZE); > Regards -- Inju Song NAVER Corporation -- To unsubscribe from this list: send the line "unsubscribe lvs-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html