Re: [PATCH v2] net: rds: acquire refcount on TCP sockets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2022/05/03 22:45, Eric Dumazet wrote:
>> This looks equivalent to the fix presented here:
>>
>> https://lore.kernel.org/all/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@xxxxxxxxxxxxxx/

I retested the fix above using

unshare -n sh -c '
ip link set lo up
iptables -A OUTPUT -p tcp --sport 16385 --tcp-flags SYN NONE -m state --state ESTABLISHED,RELATED -j DROP
ip6tables -A OUTPUT -p tcp --sport 16385 --tcp-flags SYN NONE -m state --state ESTABLISHED,RELATED -j DROP
telnet 127.0.0.1 16385
dmesg -c
netstat -tanpe' < /dev/null

as a test case, but it seems racy; sometimes timer function is called again and crashes.

[  426.086565][    C2] general protection fault, probably for non-canonical address 0x6b6af3ebcc3b6bc3: 0000 [#1] PREEMPT SMP KASAN
[  426.096339][    C2] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.18.0-rc5-dirty #807
[  426.103769][    C2] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  426.111851][    C2] RIP: 0010:__tcp_transmit_skb+0xe72/0x1b80
[  426.117512][    C2] Code: e8 b3 ea dc fd 48 8d 7d 30 45 0f b7 77 30 e8 95 ec dc fd 48 8b 5d 30 48 8d bb b8 02 00 00 e8 85 ec dc fd 48 8b 83 b8 02 00 00 <65> 4c 01 70 58 e9 67 fd ff ff e8 ef 56 ac fd 48 8d bd d0 09 00 00
[  426.124692][    C2] RSP: 0018:ffff888060d09ac8 EFLAGS: 00010246
[  426.126845][    C2] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8880145c8000 RCX: ffffffff838cc28b
[  426.129616][    C2] RDX: dffffc0000000000 RSI: 0000000000000001 RDI: ffff8880145c82b8
[  426.132374][    C2] RBP: ffff8880129f8000 R08: 0000000000000000 R09: 0000000000000007
[  426.135077][    C2] R10: ffffffff838cbfd4 R11: 0000000000000001 R12: ffff8880129f8760
[  426.137793][    C2] R13: ffff88800f6e0118 R14: 0000000000000001 R15: ffff88800f6e00e8
[  426.140489][    C2] FS:  0000000000000000(0000) GS:ffff888060d00000(0000) knlGS:0000000000000000
[  426.143525][    C2] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  426.145792][    C2] CR2: 000055b5bb0adabc CR3: 000000000e003000 CR4: 00000000000506e0
[  426.148509][    C2] Call Trace:
[  426.149442][    C2]  <IRQ>
[  426.150183][    C2]  ? __tcp_select_window+0x710/0x710
[  426.151457][    C2]  ? __sanitizer_cov_trace_cmp4+0x1c/0x70
[  426.153007][    C2]  ? tcp_current_mss+0x165/0x280
[  426.154245][    C2]  ? tcp_trim_head+0x300/0x300
[  426.155396][    C2]  ? find_held_lock+0x85/0xa0
[  426.156734][    C2]  ? mark_held_locks+0x65/0x90
[  426.157967][    C2]  tcp_write_wakeup+0x2e2/0x340
[  426.159149][    C2]  tcp_send_probe0+0x2a/0x2c0
[  426.160368][    C2]  tcp_write_timer_handler+0x5cb/0x670
[  426.161740][    C2]  tcp_write_timer+0x86/0x250
[  426.162896][    C2]  ? tcp_write_timer_handler+0x670/0x670
[  426.164285][    C2]  call_timer_fn+0x15d/0x5f0
[  426.165481][    C2]  ? add_timer_on+0x2e0/0x2e0
[  426.166667][    C2]  ? lock_downgrade+0x3c0/0x3c0
[  426.167921][    C2]  ? mark_held_locks+0x24/0x90
[  426.169263][    C2]  ? _raw_spin_unlock_irq+0x1f/0x40
[  426.170564][    C2]  ? tcp_write_timer_handler+0x670/0x670
[  426.171920][    C2]  __run_timers.part.0+0x523/0x740
[  426.173181][    C2]  ? call_timer_fn+0x5f0/0x5f0
[  426.174321][    C2]  ? pvclock_clocksource_read+0xdc/0x1a0
[  426.175655][    C2]  run_timer_softirq+0x66/0xe0
[  426.176825][    C2]  __do_softirq+0x1c2/0x670
[  426.177944][    C2]  __irq_exit_rcu+0xf8/0x140
[  426.179120][    C2]  irq_exit_rcu+0x5/0x20
[  426.180150][    C2]  sysvec_apic_timer_interrupt+0x8e/0xc0
[  426.181486][    C2]  </IRQ>
[  426.182180][    C2]  <TASK>
[  426.182845][    C2]  asm_sysvec_apic_timer_interrupt+0x12/0x20

> 
> I think this is still needed for layers (NFS ?) that dismantle their
> TCP sockets whenever a netns
> is dismantled. But RDS case was different, only the listener is a kernel socket.

We can't apply the fix above.

I think that the fundamental problem is that we use net->ns.count for both
"avoiding use-after-free" purpose and "allowing dismantle from user event" purpose.
Why not to use separated counters?




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux