Re: [syzbot] [net?] memory leak in tcp_md5_do_add

Dmitry Safonov <dima@xxxxxxxxxx> · Fri, 22 Sep 2023 23:29:01 +0100

On 9/22/23 04:20, Eric Dumazet wrote:
> On Fri, Sep 22, 2023 at 1:15 AM Dmitry Safonov <dima@xxxxxxxxxx> wrote:
>>
>> Hi Eric,
>>
>> On 9/21/23 18:01, Dmitry Safonov wrote:
>>> On 9/21/23 17:59, Eric Dumazet wrote:
[..]
>>>> Dmitry, please take a look at this bug, we need to fix it before your
>>>> patch series.
>>>
>>> Sure, seems reasonable to me to fix before merging something on top.
>>
>> It seems to me that it's related to a race between RCU grace period and
>> kmemleak scan period. There seems to be a patch [1] that likely fixes
>> that, albeit I couldn't verify it as all my attempts to reproduce syzbot
>> issue produced only unrelated to TCP-MD5 log:
>>
> 
> I doubt this, looking at the repro, which seems to abuse a not often
> used feature of TCP (self connect)
> 
> # https://syzkaller.appspot.com/bug?id=323165b5fe193114de7a3a6a8bd16cf3a3c36ecf
> # See https://goo.gl/kgGztJ for information about syzkaller reproducers.
> #{"repeat":true,"procs":1,"slowdown":1,"sandbox":"none","sandbox_arg":0,"leak":true,"netdev":true,"close_fds":true,"usb":true}
> r0 = socket$inet6_tcp(0xa, 0x1, 0x0)
> setsockopt$inet6_tcp_TCP_MD5SIG(r0, 0x6, 0xe,
> &(0x7f0000000040)={@in6={{0xa, 0x0, 0x0, @local}}, 0x0, 0x0, 0x22,
> 0x0, "b05423587c18814d6b1a5f25671d09815a4687d637ffc958defc671aad3d4de8ac7d88560c759d600ab650c07ef0ef162b199da0d017fe6f0ae40cfb4e241cf9a990f20f6b8c2c070a61cfad8a2d2600"},
> 0xd8)
> connect$inet6(r0, &(0x7f0000000180)={0xa, 0x4001, 0x0, @ipv4={'\x00',
> '\xff\xff', @remote}}, 0x1c)
> dup(0xffffffffffffffff)
> setsockopt$SO_BINDTODEVICE(r0, 0x1, 0x19,
> &(0x7f00000001c0)='ip6_vti0\x00', 0xff4a)
> 
> 
> 
> You could not have KMEMLEAK in the kernel, and run the repro a thousand times.
> 
> Then compare /proc/slabinfo before/after.

Eric!

After some experiments, I'm still standing on the same RCU/kmemleak
race. Here's what I did:

1. I couldn't reproduce it on locally-build kernel with the same .config
in order to verify that the patch does fix this. Probably the toolchain
or any other small bits make a difference.

2. I can easily reproduce it with bzImage from syzcaller.

3. slabinfo does fluctuate for kmalloc-192 even without running any
reproducer, so it'd be hard to tell the difference.

4. I went running the reproducer in loop a hundred times:
# for i in $(seq 1 100) ; do ./repro ; dmesg -c >> ./dmesg.log ; done
the dmesg clean here is because pre-compiled ring-buffer is quite small
(I attach the full logs to this mail)

5. `cat /sys/kernel/debug/kmemleak` is empty, I presume it means that
the addresses that were reported later got removed from kmemleak
lists/tables on kfree().

6. Curious about addresses reported, I can see that some addresses were
reported multiple times, which means that slab got reused, rather than
leaked:
# grep tcp_md5_do_add -B9 dmesg.log | sed -n 's/.*unreferenced object
\([^ ]\+\) .*/\1/p' | sort | uniq -c | sort -n | tail -n 5
      2 0xffff8880174b96c0
      2 0xffff888017686900
      2 0xffff8880179fd720
      3 0xffff8880146a8600
      3 0xffff888015c226c0

See the logs for 0xffff888015c226c0 address (you can view them in dmesg
that I attach):
https://gist.github.com/0x7f454c46/dcc7936392a51a789a235eb73df1598c

7. Well, OK, maybe at least one of the addresses reported was leaked?
In order to check that, I did:
# cat dmesg.log  | grep tcp_md5_do_add -B9  | sed -n 's/.*unreferenced
object \([^ ]\+\) .*/\1/p' | sort | uniq > addresses
# dmesg -c ; for i in $(cat addresses) ; do echo "dump=$i" >
/sys/kernel/debug/kmemleak ; dmesg -c >> addresses.kmemleak ; done

I attach addresses.kmemleak: the slabs were reused or weren't allocated
(I presume, the "Unknown object at 0xffff888019cdcdc0" means that slab
is free).

8. Now that I verified that kmemleak was misreporting those addresses, I
went on my regular TCP-AO selftests, added one for TCP self-connect and
with ftrace I can clearly see that md5 keys/info is deallocated:
# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 5/5   #P:2
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
 self-connect_ip-2125    [000] ...1.  6108.468401: tcp_md5_do_add
<-tcp_v4_parse_md5_keys
 self-connect_ip-2125    [000] ...1.  6108.468727: __tcp_md5_do_add
<-tcp_md5_do_add
 self-connect_ip-2125    [000] ...1.  6108.471276: tcp_clear_md5_list
<-tcp_v4_destroy_sock
    kworker/u5:1-2108    [000] ..s1.  6108.475633:
tcp_md5sig_info_free_rcu <-rcu_core
          <idle>-0       [000] ..s2.  6108.598342: tcp_md5_twsk_free_rcu
<-rcu_core

So, from source code point of view: the test opens socket, sends SYN,
receives SYN straight away and kernel decides that it's
simultaneous/fast open case. It sends SYN-ACK and establishes as normal.
On socket destruction, TCP-MD5 key gets destroyed the regular way.
Nothing seems special in this TCP self-connect case.

9. The only interesting part in this experiment is that now I have a TCP
self-connect selftest, that by its nature tests simultaneous open, which
as I expected works with TCP-MD5, but not with TCP-AO:
[ 3412.559472] TCP: AO hash mismatch for (127.0.0.1, 7010)->(127.0.0.1,
7010) SA
[ 4115.964926] TCP: AO hash mismatch for [::1]:7010->[::1]:7010 SA
L3index: 0

It was expected as tcp_inbound_ao_hash() has
:	/* Fast-path */
:	/* TODO: fix fastopen and simultaneous open (TCPF_SYN_RECV) */

Going to fix this for TCP-AO-v13.

Please, let me know if this explanation/investigation looks good to you,
so that I can proceed with v13.

Thanks,
             Dmitry
Attachment:
dmesg.log.xz

Description: application/xz
Attachment:
addresses.kmemleak.xz

Description: application/xz