Hello, Julian Thank you for your a lot of comments. On Sun, Nov 19, 2017 at 06:04:35PM +0200, Julian Anastasov wrote: > > Hello, > > On Sun, 19 Nov 2017, Inju Song wrote: > > > Implements the Google's Maglev hashing algorithm as a IPVS scheduler. > > > > Basically it provides consistent hashing but offers some special > > features about disruption and load balancing. > > > > 1) minimal disruption: when the set of destinations changes, > > a connection will likely be sent to the same destination > > as it was before. > > > > 2) load balancing: each destination will receive an almost > > equal number of connections. > > > > Seel also for detail: [3.4 Consistent Hasing] in > > https://static.googleusercontent.com/media/research.google.com/ko//pubs/archive/44824.pdf > > I see that the PDF shows implementation only for servers > with same weight. It would be better to see implementation that > considers dest->weight. > I agree. In this case, weight value is same weight or zero. So I will consider this case when I patch the source. I have read your comments about memory allocation fail, caching table, setting the entry table with NULLs and so on. I will do fix it and send v2 patchsets. > > Signed-off-by: Inju Song <inju.song@xxxxxxxxxxxxx> > > --- > > net/netfilter/ipvs/ip_vs_mh.c | 402 ++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 402 insertions(+) > > create mode 100644 net/netfilter/ipvs/ip_vs_mh.c > > > > diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c > > new file mode 100644 > > index 0000000..534a9f5 > > --- /dev/null > > +++ b/net/netfilter/ipvs/ip_vs_mh.c > > > +static inline int > > +ip_vs_mh_populate(struct ip_vs_mh_state *s, struct ip_vs_service *svc, > > + unsigned int **permutation) > > +{ > > + int i; > > + unsigned int *next; > > + struct ip_vs_mh_lookup *entry, *l; > > + struct list_head *p; > > + struct ip_vs_dest *dest; > > + int dcnt; > > + unsigned int n, c; > > + > > + dcnt = svc->num_dests; > > While calling del_dest svc->num_dests can be 0, this should > be considered in ip_vs_mh_permutate and ip_vs_mh_populate. We should > set the table with NULLs in such case. > > > + next = kcalloc(dcnt, sizeof(unsigned int), GFP_KERNEL); > > + entry = kcalloc(IP_VS_MH_LOOKUP_SIZE, sizeof(*entry), > > + GFP_KERNEL); > > It is current drawback that add_dest, upd_dest and del_dest > are considered as non-failing methods, we can not fail the operation > if memory allocation fails, for example. This is bad for add_dest, > not so for upd_dest and del_dest. In this regard, I think, the MH > scheduler should not crash if allocation fails. > > > +/* Assign all the hash buckets of the specified table with the service. > > + */ > > +static int > > +ip_vs_mh_reassign(struct ip_vs_mh_state *s, struct ip_vs_service *svc) > > +{ > > + int dcnt; > > + unsigned int **permutation; > > + > > + /* flush all the hash entry before assigning mh entry */ > > + ip_vs_mh_flush(s); > > ip_vs_mh_reassign should not call ip_vs_mh_flush. > ip_vs_mh_populate while copying the temp table to s->lookup[] > should put the refcnt to old dests (as in SH:ip_vs_sh_reassign) > and then to assign new value. NULL should be assigned if > svc->num_dests = 0. > > > +/* IPVS MH Scheduler structure */ > > +static struct ip_vs_scheduler ip_vs_mh_scheduler = { > > + .name = "mh", > > + .refcnt = ATOMIC_INIT(0), > > + .module = THIS_MODULE, > > + .n_list = LIST_HEAD_INIT(ip_vs_mh_scheduler.n_list), > > + .init_service = ip_vs_mh_init_svc, > > + .done_service = ip_vs_mh_done_svc, > > + .add_dest = ip_vs_mh_dest_changed, > > + .del_dest = ip_vs_mh_dest_changed, > > + .upd_dest = ip_vs_mh_dest_changed, > > As upd_dest is called only when weight changes, the SH and MH > schedulers should not handle this method. Their goal is to fill > the table with all added dests by ignoring health state (it is > considered only on dest selection). > > I guess, the correct configuration for SH and MH would be: > > 1.1 Add all dests with specific weight, so that the schedulers can > fill their table based on the weights. > 1.2 Set weight to 0 for all servers that are not healthy. SH and MH > should ignore these configuration events. > > after the initial configuration: > > 2. Traffic will ignore non-healthy servers, fallbacks to healthy ones. > 3. If other servers become unavailable, change their weight to 0 > > Regards > > -- > Julian Anastasov <ja@xxxxxx> Regards -- Inju Song NAVER Corporation -- To unsubscribe from this list: send the line "unsubscribe lvs-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html