> Gur Stavi wrote: > > > Gur Stavi wrote: > > > > > Gur Stavi wrote: > > > > > > > Gur Stavi wrote: > > > > > > > > >> @@ -1846,21 +1846,21 @@ static int fanout_add(struct > sock > > > *sk, > > > > > > > struct fanout_args *args) > > > > > > > > >> err = -EINVAL; > > > > > > > > >> > > > > > > > > >> spin_lock(&po->bind_lock); > > > > > > > > >> - if (packet_sock_flag(po, PACKET_SOCK_RUNNING) && > > > > > > > > >> - match->type == type && > > > > > > > > >> + if (match->type == type && > > > > > > > > >> match->prot_hook.type == po->prot_hook.type && > > > > > > > > >> match->prot_hook.dev == po->prot_hook.dev) { > > > > > > > > > > > > > > > > > > Remaining unaddressed issue is that the socket can now be > > > added > > > > > > > > > before being bound. See comment in v1. > > > > > > > > > > > > > > > > I extended the psock_fanout test with unbound fanout test. > > > > > > > > > > > > > > > > As far as I understand, the easiest way to verify bind is > to > > > test > > > > > that > > > > > > > > po->prot_hook.dev != NULL, since we are under a bind_lock > > > anyway. > > > > > > > > But perhaps a more readable and direct approach to test > "bind" > > > > > would be > > > > > > > > to test po->ifindex != -1, as ifindex is commented as > "bound > > > > > device". > > > > > > > > However, at the moment ifindex is not initialized to -1, I > can > > > add > > > > > such > > > > > > > > initialization, but perhaps I do not fully understand all > the > > > > > logic. > > > > > > > > > > > > > > > > Any preferences? > > > > > > > > > > > > > > prot_hook.dev is not necessarily set if a packet socket is > bound. > > > > > > > It may be bound to any device. See dev_add_pack and > ptype_head. > > > > > > > > > > > > > > prot_hook.type, on the other hand, must be set if bound and > is > > > only > > > > > > > modified with the bind_lock held too. > > > > > > > > > > > > > > Well, and in packet_create. But setsockopt PACKET_FANOUT_ADD > also > > > > > > > succeeds in case bind() was not called explicitly first to > bind > > > to > > > > > > > a specific device or change ptype. > > > > > > > > > > > > Please clarify the last paragraph? When you say "also succeeds" > do > > > you > > > > > > mean SHOULD succeed or MAY SUCCEED by mistake if "something" > > > happens > > > > > ??? > > > > > > > > > > I mean it succeeds currently. Which behavior must then be > maintained. > > > > > > > > > > > Do you refer to the following scenario: socket is created with > non- > > > zero > > > > > > protocol and becomes RUNNING "without bind" for all devices. In > > > that > > > > > case > > > > > > it can be added to FANOUT without bind. Is that considered a > bug or > > > > > does > > > > > > the bind requirement for fanout only apply for all-protocol (0) > > > > > sockets? > > > > > > > > > > I'm beginning to think that this bind requirement is not needed. > > > > > > > > I agree with that. I think that is an historical mistake that > socket > > > > becomes implicitly bound to all interfaces if a protocol is defined > > > > during create. Without this bind requirement would make sense. > > > > > > > > > > > > > > All type and dev are valid, even if an ETH_P_NONE fanout group > would > > > > > be fairly useless. > > > > > > > > Fanout is all about RX, I think that refusing fanout for socket > that > > > > will not receive any packet is OK. The condition can be: > > > > if (po->ifindex == -1 || !po->num) > > > > > > Fanout is not limited to sockets bound to a specific interface. > > > This will break existing users. > > > > For specific interface ifindex >= 1 > > For "any interface" ifindex == 0 > > ifindex is -1 only if the socket was created unbound with proto == 0 > > or for the rare race case that during re-bind the new dev became > unlisted. > > For both of these cases fanout should fail. > > The only case where packet_create does not call __register_prot_hook > is if proto == 0. If proto is anything else, the socket will be bound, > whether to a device hook, or ptype_all. I don't think we need this > extra ifindex condition. > Even though "unbound" is an unlikely state for such a socket the code Should still address this state consistently. If do_bind sets ifindex to -1 on the unlikely unlisted scenario so should packet_create on the more likely proto == 0 scenario. > > > > > > Binding to ETH_P_NONE is useless, but we're not going to slow down > > > legitimate users with branches for cases that are harmless. > > > > > > > With "branch", do you refer to performance or something else? > > As I said in other mail, ETH_P_NONE could not be used in a fanout > > before as well because socket cannot become RUNNING with proto == 0. > > Good point. > > > For performance, we removed the RUNNING condition and added this. > > It is not like we need to perform 5M fanout registrations/sec. It is a > > syscall after all. > > It's as much about code complexity as performance. Both the patch and > resulting code should be as small and self-evident as possible. > > Patch v3 introduces a lot of code churn. Did you look at a side by side comparison? There is really very little extra code. > > If we don't care about opening up fanout groups to ETH_P_NONE, then > patch v2 seems sufficient. If explicitly blocking this, the ENXIO > return can be added, but ideally without touching the other lines. > I am not the one to decide if opening it is a good idea but it will be ironic if a patch with the intention to remove the only-RUNNING restriction will end up allowing never-RUNNING sockets into a fanout group. > > > > I realized another possible problem. We should consider adding > ifindex > > > > Field to struct packet_fanout to be used for lookup of an existing > > > match. > > > > There is little sense to bind sockets to different interfaces and > then > > > > put them in the same fanout group. > > > > If you agree, I can prepare a separate patch for that. > > > > > > > > > The type and dev must match that of the fanout group, and once > added > > > > > to a fanout group can no longer be changed (bind will fail). > > > > > > > > > > I briefy considered the reason might be max_num_members > accounting. > > > > > Since f->num_members counts running sockets. But that is not used > > > > > when tracking membership of the group, sk_ref is. Every packet > socket > > > > > whose po->rollover is increased increases this refcount. > > > > > > > > > > > What about using ifindex to detect bind? Initialize it to -1 in > > > > > > packet_create and ensure that packet_do_bind, on success, sets > it > > > > > > to device id or 0? > > > > > > > > > > > > psock_fanout, should probably be extended with scenarios that > test > > > > > > "all devices" and all/specific protocols. Any specific scenario > > > > > > suggestions? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >