On 9/22/23 04:20, Eric Dumazet wrote: > On Fri, Sep 22, 2023 at 1:15 AM Dmitry Safonov <dima@xxxxxxxxxx> wrote: >> >> Hi Eric, >> >> On 9/21/23 18:01, Dmitry Safonov wrote: >>> On 9/21/23 17:59, Eric Dumazet wrote: [..] >>>> Dmitry, please take a look at this bug, we need to fix it before your >>>> patch series. >>> >>> Sure, seems reasonable to me to fix before merging something on top. >> >> It seems to me that it's related to a race between RCU grace period and >> kmemleak scan period. There seems to be a patch [1] that likely fixes >> that, albeit I couldn't verify it as all my attempts to reproduce syzbot >> issue produced only unrelated to TCP-MD5 log: >> > > I doubt this, looking at the repro, which seems to abuse a not often > used feature of TCP (self connect) > > # https://syzkaller.appspot.com/bug?id=323165b5fe193114de7a3a6a8bd16cf3a3c36ecf > # See https://goo.gl/kgGztJ for information about syzkaller reproducers. > #{"repeat":true,"procs":1,"slowdown":1,"sandbox":"none","sandbox_arg":0,"leak":true,"netdev":true,"close_fds":true,"usb":true} > r0 = socket$inet6_tcp(0xa, 0x1, 0x0) > setsockopt$inet6_tcp_TCP_MD5SIG(r0, 0x6, 0xe, > &(0x7f0000000040)={@in6={{0xa, 0x0, 0x0, @local}}, 0x0, 0x0, 0x22, > 0x0, "b05423587c18814d6b1a5f25671d09815a4687d637ffc958defc671aad3d4de8ac7d88560c759d600ab650c07ef0ef162b199da0d017fe6f0ae40cfb4e241cf9a990f20f6b8c2c070a61cfad8a2d2600"}, > 0xd8) > connect$inet6(r0, &(0x7f0000000180)={0xa, 0x4001, 0x0, @ipv4={'\x00', > '\xff\xff', @remote}}, 0x1c) > dup(0xffffffffffffffff) > setsockopt$SO_BINDTODEVICE(r0, 0x1, 0x19, > &(0x7f00000001c0)='ip6_vti0\x00', 0xff4a) > > > > You could not have KMEMLEAK in the kernel, and run the repro a thousand times. > > Then compare /proc/slabinfo before/after. Eric! After some experiments, I'm still standing on the same RCU/kmemleak race. Here's what I did: 1. I couldn't reproduce it on locally-build kernel with the same .config in order to verify that the patch does fix this. Probably the toolchain or any other small bits make a difference. 2. I can easily reproduce it with bzImage from syzcaller. 3. slabinfo does fluctuate for kmalloc-192 even without running any reproducer, so it'd be hard to tell the difference. 4. I went running the reproducer in loop a hundred times: # for i in $(seq 1 100) ; do ./repro ; dmesg -c >> ./dmesg.log ; done the dmesg clean here is because pre-compiled ring-buffer is quite small (I attach the full logs to this mail) 5. `cat /sys/kernel/debug/kmemleak` is empty, I presume it means that the addresses that were reported later got removed from kmemleak lists/tables on kfree(). 6. Curious about addresses reported, I can see that some addresses were reported multiple times, which means that slab got reused, rather than leaked: # grep tcp_md5_do_add -B9 dmesg.log | sed -n 's/.*unreferenced object \([^ ]\+\) .*/\1/p' | sort | uniq -c | sort -n | tail -n 5 2 0xffff8880174b96c0 2 0xffff888017686900 2 0xffff8880179fd720 3 0xffff8880146a8600 3 0xffff888015c226c0 See the logs for 0xffff888015c226c0 address (you can view them in dmesg that I attach): https://gist.github.com/0x7f454c46/dcc7936392a51a789a235eb73df1598c 7. Well, OK, maybe at least one of the addresses reported was leaked? In order to check that, I did: # cat dmesg.log | grep tcp_md5_do_add -B9 | sed -n 's/.*unreferenced object \([^ ]\+\) .*/\1/p' | sort | uniq > addresses # dmesg -c ; for i in $(cat addresses) ; do echo "dump=$i" > /sys/kernel/debug/kmemleak ; dmesg -c >> addresses.kmemleak ; done I attach addresses.kmemleak: the slabs were reused or weren't allocated (I presume, the "Unknown object at 0xffff888019cdcdc0" means that slab is free). 8. Now that I verified that kmemleak was misreporting those addresses, I went on my regular TCP-AO selftests, added one for TCP self-connect and with ftrace I can clearly see that md5 keys/info is deallocated: # cat trace # tracer: function # # entries-in-buffer/entries-written: 5/5 #P:2 # # _-----=> irqs-off/BH-disabled # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | self-connect_ip-2125 [000] ...1. 6108.468401: tcp_md5_do_add <-tcp_v4_parse_md5_keys self-connect_ip-2125 [000] ...1. 6108.468727: __tcp_md5_do_add <-tcp_md5_do_add self-connect_ip-2125 [000] ...1. 6108.471276: tcp_clear_md5_list <-tcp_v4_destroy_sock kworker/u5:1-2108 [000] ..s1. 6108.475633: tcp_md5sig_info_free_rcu <-rcu_core <idle>-0 [000] ..s2. 6108.598342: tcp_md5_twsk_free_rcu <-rcu_core So, from source code point of view: the test opens socket, sends SYN, receives SYN straight away and kernel decides that it's simultaneous/fast open case. It sends SYN-ACK and establishes as normal. On socket destruction, TCP-MD5 key gets destroyed the regular way. Nothing seems special in this TCP self-connect case. 9. The only interesting part in this experiment is that now I have a TCP self-connect selftest, that by its nature tests simultaneous open, which as I expected works with TCP-MD5, but not with TCP-AO: [ 3412.559472] TCP: AO hash mismatch for (127.0.0.1, 7010)->(127.0.0.1, 7010) SA [ 4115.964926] TCP: AO hash mismatch for [::1]:7010->[::1]:7010 SA L3index: 0 It was expected as tcp_inbound_ao_hash() has : /* Fast-path */ : /* TODO: fix fastopen and simultaneous open (TCPF_SYN_RECV) */ Going to fix this for TCP-AO-v13. Please, let me know if this explanation/investigation looks good to you, so that I can proceed with v13. Thanks, Dmitry
Attachment:
dmesg.log.xz
Description: application/xz
Attachment:
addresses.kmemleak.xz
Description: application/xz