On tir, 2017-12-26 at 12:05 +0100, Anders K. Pedersen | Cohaesio wrote: > Hello, > > On one of our border routers, Netfilter is occasionally logging packets > with "OUT=lo" (output interface lo) even though the packet should be > going out via a regular interface. This behavior is present on Linux > 4.13.0 to 4.14.9, and a bisection of the problem points to > > [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() properly > > as the first bad commit. This commit adds dst_dev_put() calls before > some dst_release() calls, and dst_dev_put() does > > dst->dev = dev_net(dst->dev)->loopback_dev; > > (among other things), which fits the problem we're seeing. > > The essential part of our nftables rule set that shows this behavior is > > chain forward { > type filter hook forward priority 0; > > meta oif { $internal_interfaces } accept > > meta oif lo ip daddr != 127.0.0.0/8 \ > log group 0 snaplen 80 prefix "oif-lo" counter > > ip saddr { $our_ip_series } \ > flow table acct_out \ > { meta oif . rt nexthop . ip saddr timeout 12m counter } \ > accept > > log group 0 snaplen 80 prefix "DROP" counter drop > } > > The router only does stateless packet filtering and no redirection or > rewriting of the packets (connection tracking, NAT, ipvs etc. are not > even compiled for this kernel). > > As a result of this problem we see packets that should be going to an > internal interface (and thus accepted by the first rule above) being > logged and dropped by the last rule. Some examples: > > Dec 22 11:57:02 cix4 oif-lo IN=eth10 OUT=lo MAC=90:e2:ba:5c:b6:95:10:f3:11:38:06:77:08:00 SRC=81.170.163.118 DST=212.97.158.33 LEN=1500 TOS=00 PREC=0x00 TTL=116 ID=25932 DF PROTO=TCP SPT=35118 DPT=8443 SEQ=604358330 ACK=1182278705 WINDOW=3295 ACK URGP=0 MARK=0 > Dec 22 11:57:02 cix4 DROP IN=eth10 OUT=lo MAC=90:e2:ba:5c:b6:95:10:f3:11:38:06:77:08:00 SRC=81.170.163.118 DST=212.97.158.33 LEN=1500 TOS=00 PREC=0x00 TTL=116 ID=25932 DF PROTO=TCP SPT=35118 DPT=8443 SEQ=604358330 ACK=1182278705 WINDOW=3295 ACK URGP=0 MARK=0 > > Dec 22 12:47:07 cix4 oif-lo IN=eth10 OUT=lo MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.30.18 DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=118 ID=10370 DF PROTO=TCP SPT=443 DPT=44988 SEQ=1141545913 ACK=3844573103 WINDOW=65535 ACK PSH URGP=0 MARK=0 > Dec 22 12:47:07 cix4 DROP IN=eth10 OUT=lo MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.30.18 DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=118 ID=10370 DF PROTO=TCP SPT=443 DPT=44988 SEQ=1141545913 ACK=3844573103 WINDOW=65535 ACK PSH URGP=0 MARK=0 > > Dec 22 12:53:56 cix4 oif-lo IN=eth10 OUT=lo MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.12.34 DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=115 ID=27728 DF PROTO=TCP SPT=443 DPT=39724 SEQ=3797156404 ACK=3944234612 WINDOW=65535 ACK PSH URGP=0 MARK=0 > Dec 22 12:53:56 cix4 DROP IN=eth10 OUT=lo MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.12.34 DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=115 ID=27728 DF PROTO=TCP SPT=443 DPT=39724 SEQ=3797156404 ACK=3944234612 WINDOW=65535 ACK PSH URGP=0 MARK=0 > > It also happens for outbound traffic, where the packets are logged and > counted in the acct_out flow table with "meta oif" = "lo", but a > correct "rt nexthop" - an example: > > Dec 22 12:29:13 cix4 oif-lo IN=team0.20 OUT=lo MAC=3c:fd:fe:15:db:a8:00:24:a8:ff:f0:00:08:00 SRC=212.97.129.25 DST=95.166.119.129 LEN=40 TOS=00 PREC=0x00 TTL=62 ID=19481 DF PROTO=TCP SPT=443 DPT=52560 SEQ=3034827396 ACK=2862814901 WINDOW=12618 ACK URGP=0 MARK=0 > > # nft list flow table filter acct_out|tr ',' '\n'|grep lo > flow table acct_out { > "lo" . 94.101.208.217 . 212.97.129.25 expires 3m17s : counter packets 1 bytes 40 > > I don't know if these packets are actually sent out on the correct > outbound interface thanks to the proper nexthop (the MAC= information > in the Netfilter log is from the received packet and thus not useful > here). > > I tried running a tcpdump on the lo interface to see if these packets > would show up there, but during the three days I had it running, it > only logged one such packet, while Netfilter logs 20+ outbound packets > every day, and the one packet logged by tcpdump was *not* logged by > Netfilter. Further testing of the individual parts of the first bad commit shows that the five first additions of the dst_dev_put() call doesn't trigger the problem, while the last one does (also without the first five), so the problematic part is: diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 3dee004..d986d80 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1369,6 +1372,7 @@ static bool rt_cache_route(struct fib_nh *nh, struct rtable *rt) prev = cmpxchg(p, orig, rt); if (prev == orig) { if (orig) { + dst_dev_put(&orig->dst); dst_release(&orig->dst); rt_free(orig); } Any help with fixing this would be much appreciated. If there's anything I should try to debug this further, please let me know. Regards, Anders K. Pedersen��.n��������+%������w��{.n����z�����n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�