Re: [PATCH bpf-next 1/2] sk_msg: Keep reference on socket file while wait_memory

Jakub Sitnicki <jakub@xxxxxxxxxxxxxx> · Fri, 19 Aug 2022 10:39:07 +0200

On Mon, Aug 15, 2022 at 10:33 AM +08, Liu Jian wrote:
> Fix the below NULL pointer dereference:
>
> [   14.471200] Call Trace:
> [   14.471562]  <TASK>
> [   14.471882]  lock_acquire+0x245/0x2e0
> [   14.472416]  ? remove_wait_queue+0x12/0x50
> [   14.473014]  ? _raw_spin_lock_irqsave+0x17/0x50
> [   14.473681]  _raw_spin_lock_irqsave+0x3d/0x50
> [   14.474318]  ? remove_wait_queue+0x12/0x50
> [   14.474907]  remove_wait_queue+0x12/0x50
> [   14.475480]  sk_stream_wait_memory+0x20d/0x340
> [   14.476127]  ? do_wait_intr_irq+0x80/0x80
> [   14.476704]  do_tcp_sendpages+0x287/0x600
> [   14.477283]  tcp_bpf_push+0xab/0x260
> [   14.477817]  tcp_bpf_sendmsg_redir+0x297/0x500
> [   14.478461]  ? __local_bh_enable_ip+0x77/0xe0
> [   14.479096]  tcp_bpf_send_verdict+0x105/0x470
> [   14.479729]  tcp_bpf_sendmsg+0x318/0x4f0
> [   14.480311]  sock_sendmsg+0x2d/0x40
> [   14.480822]  ____sys_sendmsg+0x1b4/0x1c0
> [   14.481390]  ? copy_msghdr_from_user+0x62/0x80
> [   14.482048]  ___sys_sendmsg+0x78/0xb0
> [   14.482580]  ? vmf_insert_pfn_prot+0x91/0x150
> [   14.483215]  ? __do_fault+0x2a/0x1a0
> [   14.483738]  ? do_fault+0x15e/0x5d0
> [   14.484246]  ? __handle_mm_fault+0x56b/0x1040
> [   14.484874]  ? lock_is_held_type+0xdf/0x130
> [   14.485474]  ? find_held_lock+0x2d/0x90
> [   14.486046]  ? __sys_sendmsg+0x41/0x70
> [   14.486587]  __sys_sendmsg+0x41/0x70
> [   14.487105]  ? intel_pmu_drain_pebs_core+0x350/0x350
> [   14.487822]  do_syscall_64+0x34/0x80
> [   14.488345]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> The test scene as following flow:
> thread1                               thread2
> -----------                           ---------------
>  tcp_bpf_sendmsg
>   tcp_bpf_send_verdict
>    tcp_bpf_sendmsg_redir              sock_close
>     tcp_bpf_push_locked                 __sock_release
>      tcp_bpf_push                         //inet_release
>       do_tcp_sendpages                    sock->ops->release
>        sk_stream_wait_memory          	   // tcp_close
>           sk_wait_event                      sk->sk_prot->close
>            release_sock(__sk);
>             ***
>
>                                                 lock_sock(sk);
>                                                   __tcp_close
>                                                     sock_orphan(sk)
>                                                       sk->sk_wq  = NULL
>                                                 release_sock
>             ****
>            lock_sock(__sk);
>           remove_wait_queue(sk_sleep(sk), &wait);
>              sk_sleep(sk)
>              //NULL pointer dereference
>              &rcu_dereference_raw(sk->sk_wq)->wait
>
> While waiting for memory in thread1, the socket is released with its wait
> queue because thread2 has closed it. This caused by tcp_bpf_send_verdict
> didn't increase the f_count of psock->sk_redir->sk_socket->file in thread1.

I'm not sure about this approach. Keeping a closed sock file alive, just
so we can wakeup from sleep, seems like wasted effort.

__tcp_close sets sk->sk_shutdown = RCV_SHUTDOWN | SEND_SHUTDOWN. So we
will return from sk_stream_wait_memory via the do_error path.

SEND_SHUTDOWN might be set because socket got closed and orphaned - dead
and detached from its file, like in this case.

So, IMHO, we should check if SOCK_DEAD flag is set on wakeup due to
SEND_SHUTDOWN in sk_stream_wait_memory, before accessing the wait queue.

[...]