Re: [PATCH v2 bpf-next 2/3] bpf: Add bpf_sock_destroy kfunc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Feb 28, 2023, at 2:55 PM, Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
> 
> On 2/23/23 1:53 PM, Aditi Ghag wrote:
>> +/* bpf_sock_destroy: Destroy the given socket with ECONNABORTED error code.
>> + *
>> + * The helper expects a non-NULL pointer to a full socket. It invokes
>> + * the protocol specific socket destroy handlers.
>> + *
>> + * The helper can only be called from BPF contexts that have acquired the socket
>> + * locks.
>> + *
>> + * Parameters:
>> + * @sock: Pointer to socket to be destroyed
>> + *
>> + * Return:
>> + * On error, may return EPROTONOSUPPORT, EINVAL.
>> + * EPROTONOSUPPORT if protocol specific destroy handler is not implemented.
>> + * 0 otherwise
>> + */
>> +int bpf_sock_destroy(struct sock_common *sock)
>> +{
>> +	/* Validates the socket can be type casted to a full socket. */
>> +	struct sock *sk = sk_to_full_sk((struct sock *)sock);
> 
> If sk != sock, sk is not locked.
> 
> Does it have to do sk_to_full_sk()? From looking at tcp_abort(), it can handle TCP_NEW_SYN_RECV and TCP_TIME_WAIT. The bpf-tcp-iter is iterating the hashtables that may have sk in different states. Ideally, bpf-tcp-iter should be able to use bpf_sock_destroy() to abort the sk in different states also.

I initially added the check for request sockets as tcp_abort references some of the fields outside of `sock_common`, but tcp_abort does have a special handling for both req and time_wait sockets, as you pointed out. So I'll remove the  `sk_to_full_sk()` call.


Eric/Martin:

>Ideally, bpf-tcp-iter should be able to use bpf_sock_destroy() to abort the sk in different states also.

On somewhat of a related note, I ran into an interesting problem while adding more tests to exercise changes in the first commit ("Implement batching for UDP sockets iterator") more. As it turns out, UDP *listening* sockets weren't getting destroyed as client sockets. 

So here is what the test does at a high level - 
1) Start SO_REUSEPORT servers. (I hit same issue for regular UDP listening sockets as well.)
2) Run BPF iterators that destroys sockets (there are only server sockets).
3) Start a regular server that binds on the same port as the ones from (1) with the expectation that it succeeds after (1) sockets were destroyed. The server fails to bind! 

When I debugged the issue, I found that the listening UDP sockets were still lurking around in the hash table even after they were supposed to be destroyed. With the help of kprobes and print statements in the BPF test iterator program, I confirmed that tcp_abort and the internal function calls (specifically, `__udp_disconnect`) were getting invoked as expected, and the `sk_state` was also being set to `TCP_CLOSE`. Upon further investigation, I found that the socket unhash and source port reset wasn't happening. This is the relevant code - https://github.com/torvalds/linux/blob/master/net/ipv4/udp.c#L1988 [1]. When I commented out the `SOCK_BINDPORT_LOCK` check, the new test passed as sockets were getting destroyed correctly.

I didn't observe similar behavior with TCP, and TCP listening sockets were correctly getting destroyed. `tcp_set_state` unhashes sockets unconditionally for `TCP_CLOSE` state.

Can you share insights into why the socket unhashing and source port reset doesn't happen for bind'ed sockets? If that's expected, I suppose we'll need to unhash and reset source ports for sockets getting destroyed, right?
(I wonder if this was an oversight when `__udp_disconnect` was introduced in commit 286c72deabaa240b7eebbd99496ed3324d69f3c0.)


If it's easier, I can push the next version of patches where I've addresses review comments, and added new tests. We can then continue this discussion there. In the latest version, I've modified [1] with a `TCP_CLOSE` state check.

[1] if (!(sk->sk_userlocks & SOCK_BINDPORT_LOCK)) {
		sk->sk_prot->unhash(sk);
		inet->inet_sport = 0;
     }

> 
> Otherwise, the bpf_sock_destroy kfunc is aborting the listener while the bpf prog expects to abort a req_sk.
> 
>> +
>> +	if (!sk)
>> +		return -EINVAL;
>> +
>> +	/* The locking semantics that allow for synchronous execution of the
>> +	 * destroy handlers are only supported for TCP and UDP.
>> +	 */
>> +	if (!sk->sk_prot->diag_destroy || sk->sk_protocol == IPPROTO_RAW)
>> +		return -EOPNOTSUPP;
>> +
>> +	return sk->sk_prot->diag_destroy(sk, ECONNABORTED);
>> +}
>> +
>> +__diag_pop()
> 





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux