Access of po->pppoe_dev is guarded by sk->sk_state & PPPOX_CONNECTED, and all use cases now rely on the socket lock. Because of this, the ref-count on the namespace held by the socket object suffices to hold the namespace in existence and so we don't need to ref-count the namespace in PPPoE. The flush_lock is gone. -- Michal Ostrowski mostrows@xxxxxxxxx On Mon, Oct 19, 2009 at 1:44 PM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote: > Michal Ostrowski a écrit : >> On Mon, Oct 19, 2009 at 12:12 PM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote: >>> Michal Ostrowski a écrit : >>>> Here's a bigger patch that just gets rid of flush_lock altogether. >>>> >>>> We were seeing oopses due to net namespaces going away while we were using >>>> them, which turns out is simply due to the fact that pppoew wasn't claiming ref >>>> counts properly. >>>> >>>> Fixing this requires that adding and removing entries to the per-net hash-table >>>> requires incrementing and decrementing the ref count. This also allows us to >>>> get rid of the flush_lock since we can now depend on the existence of >>>> "pn->hash_lock". >>>> >>>> We also have to be careful when flushing devices that removal of a hash table >>>> entry may bring the net namespace refcount to 0. >>>> >>> Your patch is mangled (tabulation -> white spaces), >> >> Patch mangling was due to mailer interactions, I'll attach a clean >> version here, no more inlining. >> >>> and I dont believe namespace refcount can reach 0 inside pppoe_flush_dev(), >>> it would be a bug from core network code. >>> >> >> From the original oops I was able to deduce that the namespace somehow >> managed to get destroyed during the interval where we dropped locks. >> If that's not due to the release_sock() call in pppoe_flush_dev() >> triggering a cleanup then I'd have to assume that that it's due to a >> secondary actor closing the socket in parallel, but that in turn would >> point to issues with the flush_lock. Having said that the thrust of >> this patch remains valid; it just means I don't need to inc the ref >> count in pppoe_flush_dev(). >> >> Do you agree? >> > > Not really :) > > I dont believe you should care of namespace, and/or mess with its refcount at all. > > Please dont use maybe_get_net() : This function should not ever be used in drivers/net > > You can add a BUG_ON(dev_net(xxxx)->count <= 0) if you really want, but if this > assertion is false, this is not because of pppoe. > > > lock_sock(sk); > @@ -653,10 +642,12 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, > if (stage_session(po->pppoe_pa.sid)) { > pppox_unbind_sock(sk); > if (po->pppoe_dev) { > - pn = pppoe_pernet(dev_net(po->pppoe_dev)); > + struct net *old = dev_net(po->pppoe_dev); > + pn = pppoe_pernet(old); > delete_item(pn, po->pppoe_pa.sid, > po->pppoe_pa.remote, po->pppoe_ifindex); > dev_put(po->pppoe_dev); > + put_net(old); > } > memset(sk_pppox(po) + 1, 0, > sizeof(struct pppox_sock) - sizeof(struct sock)); > > > There is still a race here, since you do a dev_put(po->ppoe_dev); without any lock held > > So pppoe_flush_dev() can run concurently and dev_put(po->ppoe_dev) at same time. > > In fact pppoe_flush_dev() can change po->ppoe_dev anytime, so you should check > all occurences of po->ppoe_dev use in the code and check if appropriate locking is done. > > pppoe_rcv_core() is not safe > pppoe_ioctl() is not safe > pppoe_sendmsg() is not safe > __pppoe_xmit() is not safe > >
Attachment:
0001-PPPoE-Fix-flush-close-races.patch
Description: Binary data