Re: [PATCHv3 ipvs-next 2/3] netfilter: ipvs: Add Maglev consistent hashing scheduler

Inju Song <inju.song@xxxxxxxxxxxxx> · Mon, 26 Mar 2018 14:49:37 +0900



	Hello, 

	Thank you for your comments.

On Sat, Mar 24, 2018 at 01:26:04PM +0200, Julian Anastasov wrote:
> 
> 	Hello,
> 
> 	Only few notes below. If all goes well, v4 can be
> the last version... :)
> 
> On Wed, 21 Mar 2018, Inju Song wrote:
> 
> > Implements the Google's Maglev hashing algorithm as a IPVS scheduler.
> > 
> > Basically it provides consistent hashing but offers some special
> > features about disruption and load balancing.
> > 
> >   1) minimal disruption: when the set of destinations changes,
> >      a connection will likely be sent to the same destination
> >      as it was before.
> > 
> >   2) load balancing: each destination will receive an almost
> >      equal number of connections.
> > 
> > Seel also for detail: [3.4 Consistent Hasing] in
> > https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-eisenbud.pdf
> 
> 	Above commit message is properly indented but comments in
> patches 1 and 3 should not be indented, there should not be
> spaces at left.
> 

	ok. I will remove indentions in patch 1 and 3.

> > Signed-off-by: Inju Song <inju.song@xxxxxxxxxxxxx>
> > ---
> >  net/netfilter/ipvs/ip_vs_mh.c | 532 ++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 532 insertions(+)
> >  create mode 100644 net/netfilter/ipvs/ip_vs_mh.c
> > 
> > diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
> > new file mode 100644
> > index 0000000..e531100
> > --- /dev/null
> > +++ b/net/netfilter/ipvs/ip_vs_mh.c
> > @@ -0,0 +1,532 @@
> 
> > +		IP_VS_DBG_BUF(6,
> > +			      "MH: selected unavailable server %s:%d (offset %d), reselecting",
> > +			      IP_VS_DBG_ADDR(dest->af, &dest->addr),
> > +			      ntohs(dest->port), roffset);
> 
> 	As some vars are unsigned int, make sure all %d in
> debug messages are changed to %u, for consistency. This
> also includes all ports. I guess, these %d come from the
> SH scheduler.
> 

	Yes, these printing format was from SH scheduler..
I will fix these properly.

> > +static void ip_vs_mh_state_free(struct rcu_head *head)
> > +{
> > +	struct ip_vs_mh_state *s;
> > +
> > +	s = container_of(head, struct ip_vs_mh_state, rcu_head);
> > +	kfree(s->lookup);
> > +	kfree(s);
> > +}
> > +
> > +static int ip_vs_mh_init_svc(struct ip_vs_service *svc)
> > +{
> > +	struct ip_vs_mh_state *s;
> > +
> > +	/* allocate the MH table for this service */
> > +	s = kzalloc(sizeof(*s), GFP_KERNEL);
> > +	if (!s)
> > +		return -ENOMEM;
> > +
> > +	s->lookup = kcalloc(IP_VS_MH_TAB_SIZE, sizeof(struct ip_vs_mh_lookup),
> > +			    GFP_KERNEL);
> > +	if (!s->lookup) {
> > +		kfree(s);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	generate_hash_secret(&s->hash1, &s->hash2);
> > +	s->gcd = ip_vs_mh_gcd_weight(svc);
> > +	s->rshift = ip_vs_mh_shift_weight(svc, s->gcd);
> > +
> > +	svc->sched_data = s;
> > +	IP_VS_DBG(6,
> > +		  "MH lookup table (memory=%zdbytes) allocated for current service\n",
> > +		  sizeof(struct ip_vs_mh_lookup) * IP_VS_MH_TAB_SIZE);
> > +
> > +	/* permutate & populate with current dests */
> > +	return ip_vs_mh_reassign(s, svc);
> 
> 	Looks like we must not set svc->sched_data on error.
> We can do it like this:
> 
> 	int ret;
> 
> 	...
> 	IP_VS_DBG()
> 	ret = ip_vs_mh_reassign(s, svc);
> 	if (ret < 0) {
> 		ip_vs_mh_reset(s); /* This is here for completeness */
> 		ip_vs_mh_state_free(&s->rcu_head);
> 		return ret;
> 	}
> 	/* No more failures, attach state */
> 	svc->sched_data = s;
> 	return 0;
> 
> 	It should be ok because we use svc->sched_data only
> from entry points, the scheduler does not use it internally.
> 

	OK. Above case will be handled.

> > +}
> > +
> > +static void ip_vs_mh_done_svc(struct ip_vs_service *svc)
> > +{
> > +	struct ip_vs_mh_state *s = svc->sched_data;
> > +
> > +	/* got to clean up hash buckets here */
> > +	ip_vs_mh_reset(s);
> > +
> > +	/* release the table itself */
> > +	ip_vs_mh_state_free(&s->rcu_head);
> 
> 	Above should be:
> 
> 	call_rcu(&s->rcu_head, ip_vs_mh_state_free);
> 
> 	As result, the RCU callback will be called after
> RCU grace period giving chance to RCU readers to stop
> using the svc->sched_data state.
> 

	Ok, I missed adding to schedule rcu callback for
freeing the state. It will be fixed. Thanks.

> > +	IP_VS_DBG(6, "MH lookup table (memory=%zdbytes) released\n",
> > +		  sizeof(struct ip_vs_mh_lookup) * IP_VS_MH_TAB_SIZE);
> 

Regards

-- 
Inju Song
NAVER Corporation

--
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html