Re: [PATCH bpf-next 2/3] bpf: allow bpf_{s,g}etsockopt from cgroup bind{4,6} hooks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/29, Andrey Ignatov wrote:
Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> [Tue, 2020-11-17 20:05 -0800]: > On Tue, Nov 17, 2020 at 4:17 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote:
[..]
>
> I think it is ok, but I need to go through the locking paths more.
> Andrey,
> please take a look as well.

Sorry for delay, I was offline for the last two weeks.
No worries, I was OOO myself last week, thanks for the feedback!

 From the correctness perspective it looks fine to me.

 From the performance perspective I can think of one relevant scenario.
Quite common use-case in applications is to use bind(2) not before
listen(2) but before connect(2) for client sockets so that connection
can be set up from specific source IP and, optionally, port.

Binding to both IP and port case is not interesting since it's already
slow due to get_port().

But some applications do care about connection setup performance and at
the same time need to set source IP only (no port). In this case they
use IP_BIND_ADDRESS_NO_PORT socket option, what makes bind(2) fast
(we've discussed it with Stanislav earlier in [0]).

I can imagine some pathological case when an application sets up tons of
connections with bind(2) before connect(2) for sockets with
IP_BIND_ADDRESS_NO_PORT enabled (that by itself requires setsockopt(2)
though, i.e. socket lock/unlock) and that another lock/unlock to run
bind hook may add some overhead. Though I do not know how critical that
overhead may be and whether it's worth to benchmark or not (maybe too
much paranoia).

[0] https://lore.kernel.org/bpf/20200505182010.GB55644@rdna-mbp/
Even in case of IP_BIND_ADDRESS_NO_PORT, inet[6]_bind() does
lock_sock down the line, so it's not like we are switching
a lockless path to the one with the lock, right?

And in this case, similar to listen, the socket is still uncontended and
owned by the userspace. So that extra lock/unlock should be cheap
enough to be ignored (spin_lock_bh on the warm cache line).

Am I missing something?



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux