On Thu, Aug 18, 2016 at 08:58:31AM +0800, Feng Gao wrote: > On Thu, Aug 18, 2016 at 1:42 AM, Guillaume Nault <g.nault@xxxxxxxxxxxx> wrote: > > On Tue, Aug 16, 2016 at 07:33:38PM +0800, fgao@xxxxxxxxxx wrote: > >> From: Gao Feng <fgao@xxxxxxxxxx> > >> > >> PPP channel holds one spinlock before send frame. But the skb may > >> select the same PPP channel with wrong route policy. As a result, > >> the skb reaches the same channel path. It tries to get the same > >> spinlock which is held before. Bang, the deadlock comes out. > >> > > Unless I misunderstood the problem you're trying to solve, this patch > > doesn't really help: deadlock still occurs if the same IP is used for > > L2TP and PPP's peer address. > > > > The deadlock happens because the same cpu try to hold the spinlock > which is already held before by itself. > Now the PPP_CHANNEL_LOCK_BH sets the lock owner after hold lock, then > when the same cpu > invokes PPP_CHANNEL_LOCK_BH again. The cl.owner equals current cpu id, > so it only increases > the lock_cnt without trying to hold the lock again. > So it avoids the deadlock. > I'm sorry but, again, it just _moves_ the deadlock down to L2TP. The kernel still oops because, now, l2tp_xmit_skb() is called recursively while holding its tunnel socket. > >> Now add one lock owner to avoid it like xmit_lock_owner of > >> netdev_queue. Check the lock owner before try to get the spinlock. > >> If the current cpu is already the owner, needn't lock again. When > >> PPP channel holds the spinlock at the first time, it sets owner > >> with current CPU ID. > >> > > I think you should forbid lock recursion entirely, and drop the packet > > if the owner tries to re-acquire the channel lock. Otherwise you just > > move the deadlock down the stack (l2tp_xmit_skb() can't be called > > recursively). > > The reason that fix it in ppp is that there are other layer on the ppp module. > We resolve it in ppp module, it could avoid all similar potential issues. > Not sure if I understand your point here. The xmit path of PPP and its sub-layers hasn't been designed to be reentrant. Allowing recursive sends thus require to review the full path. Beyond the L2TP issue discussed above, just consider the locking dependencies used in PPP: ppp->wlock has to be held before channel->downl. Sending a packet directly on a channel will lock channel->downl. If this packet is routed back to the parent unit then ppp_xmit_process() will lock ppp->wlock, effectively leading to lock inversion and potential deadlock. So we have two options: adapt the whole xmit path to handle recursive sends or forbid recursion entirely. Unfortunately none of these options looks easy to achieve: * Making PPP xmit path reentrant will be hard and error prone because of all the locking dependencies. Looks like simplifying PPP's locking scheme will be required first. * I can't see any way to reliably prevent settings where a packet sent on a given channel would be routed back to the parent unit. OTOH, we can try to limit the impact of recursive sends for simple cases: * Following your approach, either adapt the lower layers (like l2tp_xmit_skb() for L2TP), or drop the packet when cl.owner == smp_processor_id(). This is very limited in scope and doesn't address issues like locking inversions. But it may let the system survive long enough for the PPP to time out. * In the lower layer, check where the packet is going to be enqueued and drop it if it's the parent device. That should reliably handle simple and common cases. However this requires to update at least L2TP and PPTP and to get a way to access the parent device. Also, it doesn't prevent recursion with stacked interfaces. -- To unsubscribe from this list: send the line "unsubscribe linux-ppp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html