On Tue, May 10, 2022 at 2:17 AM Julian Anastasov <ja@xxxxxx> wrote: > > > Hello, > > On Mon, 9 May 2022, menglong8.dong@xxxxxxxxx wrote: > > > From: Menglong Dong <imagedong@xxxxxxxxxxx> > > > > For now, the start of the RR scheduler is in the order of dest > > service added, it will result in imbalance if the load balance > > ...order of added destinations,... > > > is done in client side and long connect is used. > > ..."long connections are used". Is this a case where > small number of connections are used? And the two connections > relatively overload the real servers? > > > For example, we have client1, client2, ..., client5 and real service > > service1, service2, service3. All clients have the same ipvs config, > > and each of them will create 2 long TCP connect to the virtual > > service. Therefore, all the clients will connect to service1 and > > service2, leaving service3 free. > > You mean, there are many IPVS directors with same > config and each director gets 2 connections? Third connection > will get real server #3, right ? Also, are the clients local > to the director? > Sorry to have missed your message here. Yeah, this is what I mean. And in my case, the directors are local the to client, and each client only have 2 connections. If the 3th connection happens, it will get #3 real server. And all directors connected to #1 and #2 real servers, resulting in overload. > > Fix this by randomize the start of dest service to RR scheduler when > > ..."randomizing the starting destination when" > > > IP_VS_SVC_F_SCHED_RR_RANDOM is set. > > > > Signed-off-by: Menglong Dong <imagedong@xxxxxxxxxxx> > > --- > > include/uapi/linux/ip_vs.h | 2 ++ > > net/netfilter/ipvs/ip_vs_rr.c | 25 ++++++++++++++++++++++++- > > 2 files changed, 26 insertions(+), 1 deletion(-) > > > > diff --git a/include/uapi/linux/ip_vs.h b/include/uapi/linux/ip_vs.h > > index 4102ddcb4e14..7f74bafd3211 100644 > > --- a/include/uapi/linux/ip_vs.h > > +++ b/include/uapi/linux/ip_vs.h > > @@ -28,6 +28,8 @@ > > #define IP_VS_SVC_F_SCHED_SH_FALLBACK IP_VS_SVC_F_SCHED1 /* SH fallback */ > > #define IP_VS_SVC_F_SCHED_SH_PORT IP_VS_SVC_F_SCHED2 /* SH use port */ > > > > +#define IP_VS_SVC_F_SCHED_RR_RANDOM IP_VS_SVC_F_SCHED1 /* random start */ > > + > > /* > > * Destination Server Flags > > */ > > diff --git a/net/netfilter/ipvs/ip_vs_rr.c b/net/netfilter/ipvs/ip_vs_rr.c > > index 38495c6f6c7c..e309d97bdd08 100644 > > --- a/net/netfilter/ipvs/ip_vs_rr.c > > +++ b/net/netfilter/ipvs/ip_vs_rr.c > > @@ -22,13 +22,36 @@ > > > > #include <net/ip_vs.h> > > > > +static void ip_vs_rr_random_start(struct ip_vs_service *svc) > > +{ > > + struct list_head *cur; > > + u32 start; > > + > > + if (!(svc->flags | IP_VS_SVC_F_SCHED_RR_RANDOM) || > > | -> & > > > + svc->num_dests <= 1) > > + return; > > + > > + spin_lock_bh(&svc->sched_lock); > > + start = get_random_u32() % svc->num_dests; > > May be prandom is more appropriate for non-crypto purposes. > Also, not sure if it is a good idea to limit the number of steps, > eg. to 128... > > start = prandom_u32_max(min(svc->num_dests, 128U)); > > or just use > > start = prandom_u32_max(svc->num_dests); > > Also, this line can be before the spin_lock_bh. > > > + cur = &svc->destinations; > > cur = svc->sched_data; > > ... and to start from current svc->sched_data because > we are called for every added dest. Better to jump 0..127 steps > ahead, to avoid delay with long lists? > > > + while (start--) > > + cur = cur->next; > > + svc->sched_data = cur; > > + spin_unlock_bh(&svc->sched_lock); > > +} > > > > static int ip_vs_rr_init_svc(struct ip_vs_service *svc) > > { > > svc->sched_data = &svc->destinations; > > + ip_vs_rr_random_start(svc); > > return 0; > > } > > > > +static int ip_vs_rr_add_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest) > > +{ > > + ip_vs_rr_random_start(svc); > > + return 0; > > +} > > > > static int ip_vs_rr_del_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest) > > { > > @@ -104,7 +127,7 @@ static struct ip_vs_scheduler ip_vs_rr_scheduler = { > > .module = THIS_MODULE, > > .n_list = LIST_HEAD_INIT(ip_vs_rr_scheduler.n_list), > > .init_service = ip_vs_rr_init_svc, > > - .add_dest = NULL, > > + .add_dest = ip_vs_rr_add_dest, > > .del_dest = ip_vs_rr_del_dest, > > .schedule = ip_vs_rr_schedule, > > }; > > -- > > 2.36.0 > > Regards > > -- > Julian Anastasov <ja@xxxxxx> >