On Mon, Oct 19, 2009 at 12:12 PM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote: > Michal Ostrowski a écrit : >> Here's a bigger patch that just gets rid of flush_lock altogether. >> >> We were seeing oopses due to net namespaces going away while we were using >> them, which turns out is simply due to the fact that pppoew wasn't claiming ref >> counts properly. >> >> Fixing this requires that adding and removing entries to the per-net hash-table >> requires incrementing and decrementing the ref count. This also allows us to >> get rid of the flush_lock since we can now depend on the existence of >> "pn->hash_lock". >> >> We also have to be careful when flushing devices that removal of a hash table >> entry may bring the net namespace refcount to 0. >> > > Your patch is mangled (tabulation -> white spaces), Patch mangling was due to mailer interactions, I'll attach a clean version here, no more inlining. > > and I dont believe namespace refcount can reach 0 inside pppoe_flush_dev(), > it would be a bug from core network code. > >From the original oops I was able to deduce that the namespace somehow managed to get destroyed during the interval where we dropped locks. If that's not due to the release_sock() call in pppoe_flush_dev() triggering a cleanup then I'd have to assume that that it's due to a secondary actor closing the socket in parallel, but that in turn would point to issues with the flush_lock. Having said that the thrust of this patch remains valid; it just means I don't need to inc the ref count in pppoe_flush_dev(). Do you agree? -- Michal Ostrowski mostrows@xxxxxxxxx
Attachment:
0001-PPPoE-Fix-ref-counts-on-net-namespaces.patch
Description: Binary data