Re: [PATCH 1/2] bpf: Add socket destroy capability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the reviews.

@martin: Thanks, I'll look into the missing batching related logic for UDP. Appreciate the pointers.
OT: While we can use the has_current_bpf_ctx macro to skip taking sock locks in the BPF context, how is the assumption around locking enforced in the code (e.g., setsockopt helper assumes that BPF iter prog acquires the relevant locks)? We could potentially use lockdep_sock_is_held for the sock lock. It's a debug configuration, but I suppose it's enabled in selftests.

> On Dec 21, 2022, at 9:08 PM, Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
> 
> On 12/16/22 5:57 PM, Aditi Ghag wrote:
>> The socket destroy helper is used to
>> forcefully terminate sockets from certain
>> BPF contexts. We plan to use the capability
>> in Cilium to force client sockets to reconnect
>> when their remote load-balancing backends are
>> deleted. The other use case is on-the-fly
>> policy enforcement where existing socket
>> connections prevented by policies need to
>> be terminated.
>> The helper is currently exposed to iterator
>> type BPF programs where users can filter,
>> and terminate a set of sockets.
>> Sockets are destroyed asynchronously using
>> the work queue infrastructure. This allows
>> for current the locking semantics within
>> socket destroy handlers, as BPF iterators
>> invoking the helper acquire *sock* locks.
>> This also allows the helper to be invoked
>> from non-sleepable contexts.
>> The other approach to skip acquiring locks
>> by passing an argument to the `diag_destroy`
>> handler didn't work out well for UDP, as
>> the UDP abort function internally invokes
>> another function that ends up acquiring
>> *sock* lock.
>> While there are sleepable BPF iterators,
>> these are limited to only certain map types.
> 
> bpf-iter program can be sleepable and non sleepable. Both sleepable and non sleepable tcp/unix bpf-iter programs have been able to call bpf_setsockopt() synchronously. bpf_setsockopt() also requires the sock lock to be held first. The situation on calling '.diag_destroy' from bpf-iter should not be much different from calling bpf_setsockopt(). From a quick look at tcp_abort and udp_abort, I don't see they might sleep also and you may want to double check. Even '.diag_destroy' was only usable in sleepable 'bpf-iter' because it might sleep, the common bpf map types are already available to the sleepable programs.
> 
> At the kernel side, the tcp and unix iter acquire the lock_sock() first (eg. in bpf_iter_tcp_seq_show()) before calling the bpf-iter prog . At the kernel setsockopt code (eg. do_ip_setsockopt()), it uses sockopt_lock_sock() and avoids doing the lock if it has already been guaranteed by the bpf running context.
> 
> For udp, I don't see how the udp_abort acquires the sock lock differently from tcp_abort.  I assume the actual problem seen in udp_abort is related to the '->unhash()' part which acquires the udp_table's bucket lock.  This is a problem for udp bpf-iter only because the udp bpf-iter did not release the udp_table's bucket lock before calling the bpf prog.  The tcp (and unix) bpf-iter releases the bucket lock first before calling the bpf prog. This was done explicitly to allow acquiring the sock lock before calling the bpf prog because otherwise it will have a lock ordering issue. Hence, for this reason, bpf_setsockopt() is only available to tcp and unix bpf-iter but not udp bpf-iter. The udp-iter needs to do similar change like the tcp and unix iter (https://lore.kernel.org/bpf/20210701200535.1033513-1-kafai@xxxxxx/): batch, release the bucket's lock, lock the sock, and then call bpf prog.  This will allow udp-iter to call bpf_setsockopt() like its tcp and unix counterpart.  That will also allow udp bpf-iter prog to directly do '.diag_destroy'.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux