Re: [PATCH v3 bpf-next 3/5] [RFC] net: Skip taking lock in BPF context

Aditi Ghag <aditi.ghag@xxxxxxxxxxxxx> · Tue, 21 Mar 2023 14:43:43 -0700

> On Mar 21, 2023, at 2:37 PM, Aditi Ghag <aditi.ghag@xxxxxxxxxxxxx> wrote:
> 
> 
> 
>> On Mar 21, 2023, at 2:31 PM, Stanislav Fomichev <sdf@xxxxxxxxxx> wrote:
>> 
>> On 03/21, Aditi Ghag wrote:
>>> When sockets are destroyed in the BPF iterator context, sock
>>> lock is already acquired, so skip taking the lock. This allows
>>> TCP listening sockets to be destroyed from BPF programs.
>> 
>>> Signed-off-by: Aditi Ghag <aditi.ghag@xxxxxxxxxxxxx>
>>> ---
>>> net/ipv4/inet_hashtables.c | 9 ++++++---
>>> 1 file changed, 6 insertions(+), 3 deletions(-)
>> 
>>> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
>>> index e41fdc38ce19..5543a3e0d1b4 100644
>>> --- a/net/ipv4/inet_hashtables.c
>>> +++ b/net/ipv4/inet_hashtables.c
>>> @@ -777,9 +777,11 @@ void inet_unhash(struct sock *sk)
>>> 		/* Don't disable bottom halves while acquiring the lock to
>>> 		 * avoid circular locking dependency on PREEMPT_RT.
>>> 		 */
>>> -		spin_lock(&ilb2->lock);
>>> +		if (!has_current_bpf_ctx())
>>> +			spin_lock(&ilb2->lock);
>>> 		if (sk_unhashed(sk)) {
>>> -			spin_unlock(&ilb2->lock);
>>> +			if (!has_current_bpf_ctx())
>>> +				spin_unlock(&ilb2->lock);
>> 
>> That's bucket lock, why do we have to skip it?
> 
> Because we take the bucket lock while iterating UDP sockets. See the first commit in this series around batching. But not all BPF contexts that could invoke this function may not acquire the lock, so we can't always skip it. 

Sorry, my buttery fingers hit the sent button too soon. You are right, it's the bucket and not *sock* lock. The commit that adds batching releases the bucket lock. I'll take a look at the stack trace again. 

> 
>> 
>>> 			return;
>>> 		}
>> 
>>> @@ -788,7 +790,8 @@ void inet_unhash(struct sock *sk)
>> 
>>> 		__sk_nulls_del_node_init_rcu(sk);
>>> 		sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
>>> -		spin_unlock(&ilb2->lock);
>>> +		if (!has_current_bpf_ctx())
>>> +			spin_unlock(&ilb2->lock);
>>> 	} else {
>>> 		spinlock_t *lock = inet_ehash_lockp(hashinfo, sk->sk_hash);
>> 
>>> --
>>> 2.34.1