RE: [PATCH bpf-next 1/2] sk_msg: Keep reference on socket file while wait_memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: John Fastabend [mailto:john.fastabend@xxxxxxxxx]
> Sent: Wednesday, August 17, 2022 8:55 AM
> To: liujian (CE) <liujian56@xxxxxxxxxx>; john.fastabend@xxxxxxxxx;
> jakub@xxxxxxxxxxxxxx; edumazet@xxxxxxxxxx; davem@xxxxxxxxxxxxx;
> yoshfuji@xxxxxxxxxxxxxx; dsahern@xxxxxxxxxx; kuba@xxxxxxxxxx;
> pabeni@xxxxxxxxxx; andrii@xxxxxxxxxx; mykolal@xxxxxx; ast@xxxxxxxxxx;
> daniel@xxxxxxxxxxxxx; martin.lau@xxxxxxxxx; song@xxxxxxxxxx; yhs@xxxxxx;
> kpsingh@xxxxxxxxxx; sdf@xxxxxxxxxx; haoluo@xxxxxxxxxx;
> jolsa@xxxxxxxxxx; shuah@xxxxxxxxxx; bpf@xxxxxxxxxxxxxxx
> Cc: liujian (CE) <liujian56@xxxxxxxxxx>
> Subject: RE: [PATCH bpf-next 1/2] sk_msg: Keep reference on socket file
> while wait_memory
> 
> Liu Jian wrote:
> > Fix the below NULL pointer dereference:
> >
> > [   14.471200] Call Trace:
> > [   14.471562]  <TASK>
> > [   14.471882]  lock_acquire+0x245/0x2e0
> > [   14.472416]  ? remove_wait_queue+0x12/0x50
> > [   14.473014]  ? _raw_spin_lock_irqsave+0x17/0x50
> > [   14.473681]  _raw_spin_lock_irqsave+0x3d/0x50
> > [   14.474318]  ? remove_wait_queue+0x12/0x50
> > [   14.474907]  remove_wait_queue+0x12/0x50
> > [   14.475480]  sk_stream_wait_memory+0x20d/0x340
> > [   14.476127]  ? do_wait_intr_irq+0x80/0x80
> > [   14.476704]  do_tcp_sendpages+0x287/0x600
> > [   14.477283]  tcp_bpf_push+0xab/0x260
> > [   14.477817]  tcp_bpf_sendmsg_redir+0x297/0x500
> > [   14.478461]  ? __local_bh_enable_ip+0x77/0xe0
> > [   14.479096]  tcp_bpf_send_verdict+0x105/0x470
> > [   14.479729]  tcp_bpf_sendmsg+0x318/0x4f0
> > [   14.480311]  sock_sendmsg+0x2d/0x40
> > [   14.480822]  ____sys_sendmsg+0x1b4/0x1c0
> > [   14.481390]  ? copy_msghdr_from_user+0x62/0x80
> > [   14.482048]  ___sys_sendmsg+0x78/0xb0
> > [   14.482580]  ? vmf_insert_pfn_prot+0x91/0x150
> > [   14.483215]  ? __do_fault+0x2a/0x1a0
> > [   14.483738]  ? do_fault+0x15e/0x5d0
> > [   14.484246]  ? __handle_mm_fault+0x56b/0x1040
> > [   14.484874]  ? lock_is_held_type+0xdf/0x130
> > [   14.485474]  ? find_held_lock+0x2d/0x90
> > [   14.486046]  ? __sys_sendmsg+0x41/0x70
> > [   14.486587]  __sys_sendmsg+0x41/0x70
> > [   14.487105]  ? intel_pmu_drain_pebs_core+0x350/0x350
> > [   14.487822]  do_syscall_64+0x34/0x80
> > [   14.488345]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >
> > The test scene as following flow:
> > thread1                               thread2
> > -----------                           ---------------
> >  tcp_bpf_sendmsg
> >   tcp_bpf_send_verdict
> >    tcp_bpf_sendmsg_redir              sock_close
> >     tcp_bpf_push_locked                 __sock_release
> >      tcp_bpf_push                         //inet_release
> >       do_tcp_sendpages                    sock->ops->release
> >        sk_stream_wait_memory          	   // tcp_close
> >           sk_wait_event                      sk->sk_prot->close
> >            release_sock(__sk);
> >             ***
> >
> >                                                 lock_sock(sk);
> >                                                   __tcp_close
> >                                                     sock_orphan(sk)
> >                                                       sk->sk_wq  = NULL
> >                                                 release_sock
> >             ****
> >            lock_sock(__sk);
> >           remove_wait_queue(sk_sleep(sk), &wait);
> >              sk_sleep(sk)
> >              //NULL pointer dereference
> >              &rcu_dereference_raw(sk->sk_wq)->wait
> >
> > While waiting for memory in thread1, the socket is released with its
> > wait queue because thread2 has closed it. This caused by
> > tcp_bpf_send_verdict didn't increase the f_count of psock->sk_redir-
> >sk_socket->file in thread1.
> >
> > Avoid it by keeping a reference to the socket file while redirect sock
> > wait send memory. Refer to [1].
> >
> > [1]
> > https://lore.kernel.org/netdev/20190211090949.18560-1-jakub@cloudflare
> > .com/
> >
> > Signed-off-by: Liu Jian <liujian56@xxxxxxxxxx>
> 
> Thanks for the detailed commit message its necessary to understand the
> problem without spending hours deciphering it myself.
> 
> When I looked at [1] we solved a simliar problem by using the
> MSG_DONTWAIT flag so that the error was pushed back to the sending.
> 
> Can we do the same thing here? The nice bit here is the error would get all
> the way back to the sending socket so userspace could decide how to handle
> it? Did I miss something?
> 
[1] works in sk_psock_backlog function, it can not be detected by the userspace app.
But here, the problem is that app wants this to be a blocked system call.
If the MSG_DONTWAIT flag is forcibly added, this changes the function behavior.

> > ---
> >  net/ipv4/tcp_bpf.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index
> > a1626afe87a1..201375829367 100644
> > --- a/net/ipv4/tcp_bpf.c
> > +++ b/net/ipv4/tcp_bpf.c
> > @@ -125,9 +125,17 @@ static int tcp_bpf_push_locked(struct sock *sk,
> > struct sk_msg *msg,  {
> >  	int ret;
> >
> > +	/* Hold on to socket wait queue. */
> > +	if (sk->sk_socket && sk->sk_socket->file)
> > +		get_file(sk->sk_socket->file);
> > +
> >  	lock_sock(sk);
> >  	ret = tcp_bpf_push(sk, msg, apply_bytes, flags, uncharge);
> >  	release_sock(sk);
> > +
> > +	if (sk->sk_socket && sk->sk_socket->file)
> > +		fput(sk->sk_socket->file);
> > +
> >  	return ret;
> >  }
> >
> > --
> > 2.17.1
> >
> 





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux