2013/8/26 Julian Anastasov <ja@xxxxxx>: > On Mon, 26 Aug 2013, Drunkard Zhang wrote: > >> Good news, I finally found the crap source, it's keepalived. I tested >> several times without keepalived in runlevel 3, after kernel boots I >> add the ipvs service by hand: > > OK, I was worried that my recent RCU changes broke > something in the WRR scheduler and the configuration process. > >> ./ipvsadm -C >> # Clear previous log >> > /var/log/kern.log >> sleep 1 >> # Start debug >> echo 20 > /proc/sys/net/ipv4/vs/debug_level >> ./ipvsadm -R < /etc/keepalived/rules-with-ops >> usleep 30000 >> # Stop debug >> echo 0 > /proc/sys/net/ipv4/vs/debug_level >> >> Then add VIP manually, then do ARP announce manually: >> vs3 ~/pkgs # ip a add 150.164.100.120/32 dev eno1 >> vs3 ~/pkgs # arp-sk -i eno1 -S 150.164.100.120:90:b1:1c:1a:59:46 -d >> 150.164.100.126 >> >> After these actions, traffic starts come in. and all ipvsadm checks >> are fine, OPS is fine too. So I figured that maybe outdated libipvs in >> keepalived broke the ipvs in kernel. I'll try to report this to >> upstream. > > OK, I have no more doubts. To summarize, > here is what I think happened: > > - packet is scheduled while there is virtual service without > the --ops flag. The result is that an UDP connection is > created that expires after 5mins by default, if there are > no more packets. > > - traffic is not stopped, it hits the connection and > restarts its timer. As result, this connection stays > forever and forwards traffic to single server. This explains why expire time from "ipvsadm -lcn" keeps at 5.00min. > - as single connection is used we see that the stats for > Conns and CPS rate do not move because we do not create > connections anymore, all traffic comes from single client > address and the scheduler is not called. > > - there is one variation here: ipvsadm -C is called, > dests are moved to the trash list, new rules are > added but before the RCU grace period is expired. > In such case IP_VS_DEST_STATE_REMOVING is still set and > prevents the same dest to be reused when adding the > same dest parameters. In this case the connection will point > to unavailable dest for 5mins and the traffic that hits it > will not restart its timer. After 5mins the connection > will be removed and the first packet that comes > will use the --ops flag. There is a chance everything > to work. So, if new rules are added we have 2 > situations: > > 1. rules reuse old dests and traffic goes to single server. > This happens if the new rules are added after at least > 10ms (the RCU grace period, in fact), eg. with > usleep 10000 after ipvsadm -C. We have CPS=0 and > InPPS above 0 for single server. > > 2. rules allocate new dest and traffic is stopped > for 5mins. This will happen if rules are added > immediately after ipvsadm -C (while in RCU grace period). > After 5mins everything works. > > - CPS 0 means we are reusing existing connection > > - even if you replace the service or set --ops, the > existing connection is still used, even ipvsadm -C > can not remove it. There is only one chance: to set > expire_nodest_conn=1, to call ipvsadm -C and to wait > next packet to remove the connection. Then to add > all rules again but not before the connection is removed. > >> On the other hand, ipvs didn't recovery from ipvsadm -C, rmmod ip_vs >> && ./ipvsadm -R < rules-with-ops is needed (I tested, reload ip_vs >> module could make OPS work). So robustness of IPVS needs improvement. > > Some problem? May be you refer to the fact that > connections survive ipvsadm -C and that is what prevented > your traffic to be scheduled. > > So, I see two problems here: > > - tools do not set --ops, connection is created and is > reused from all packets from same client. The trick > to add --ops later can not work. Idea: drop traffic > before reaching IPVS (-j DROP) until --ops is applied, > by this way no connections should be created. > > - no way to flush connections in IPVS without removing the > module because expire_nodest_conn works only when traffic is > received. I think, your above remark points here. Again, thanks for your explanation, now I understand all these "weird" things, it's all because of not supporting --ops by keepalived. -- To unsubscribe from this list: send the line "unsubscribe lvs-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html