Hi Guillaume, Thanks your detail analyses. Now I think it is a good solution that just drop the packet and print error log instead my original solution that supports reentrant. This solution will not bring any side effects. I will send one update according to this new solution. Regards Feng On Fri, Aug 19, 2016 at 12:13 AM, Guillaume Nault <g.nault@xxxxxxxxxxxx> wrote: > On Thu, Aug 18, 2016 at 08:58:31AM +0800, Feng Gao wrote: >> On Thu, Aug 18, 2016 at 1:42 AM, Guillaume Nault <g.nault@xxxxxxxxxxxx> wrote: >> > On Tue, Aug 16, 2016 at 07:33:38PM +0800, fgao@xxxxxxxxxx wrote: >> >> From: Gao Feng <fgao@xxxxxxxxxx> >> >> >> >> PPP channel holds one spinlock before send frame. But the skb may >> >> select the same PPP channel with wrong route policy. As a result, >> >> the skb reaches the same channel path. It tries to get the same >> >> spinlock which is held before. Bang, the deadlock comes out. >> >> >> > Unless I misunderstood the problem you're trying to solve, this patch >> > doesn't really help: deadlock still occurs if the same IP is used for >> > L2TP and PPP's peer address. >> > >> >> The deadlock happens because the same cpu try to hold the spinlock >> which is already held before by itself. >> Now the PPP_CHANNEL_LOCK_BH sets the lock owner after hold lock, then >> when the same cpu >> invokes PPP_CHANNEL_LOCK_BH again. The cl.owner equals current cpu id, >> so it only increases >> the lock_cnt without trying to hold the lock again. >> So it avoids the deadlock. >> > I'm sorry but, again, it just _moves_ the deadlock down to L2TP. The > kernel still oops because, now, l2tp_xmit_skb() is called recursively > while holding its tunnel socket. > >> >> Now add one lock owner to avoid it like xmit_lock_owner of >> >> netdev_queue. Check the lock owner before try to get the spinlock. >> >> If the current cpu is already the owner, needn't lock again. When >> >> PPP channel holds the spinlock at the first time, it sets owner >> >> with current CPU ID. >> >> >> > I think you should forbid lock recursion entirely, and drop the packet >> > if the owner tries to re-acquire the channel lock. Otherwise you just >> > move the deadlock down the stack (l2tp_xmit_skb() can't be called >> > recursively). >> >> The reason that fix it in ppp is that there are other layer on the ppp module. >> We resolve it in ppp module, it could avoid all similar potential issues. >> > Not sure if I understand your point here. > The xmit path of PPP and its sub-layers hasn't been designed to be > reentrant. Allowing recursive sends thus require to review the full > path. > > Beyond the L2TP issue discussed above, just consider the locking > dependencies used in PPP: ppp->wlock has to be held before > channel->downl. Sending a packet directly on a channel will lock > channel->downl. If this packet is routed back to the parent unit then > ppp_xmit_process() will lock ppp->wlock, effectively leading to lock > inversion and potential deadlock. > > So we have two options: adapt the whole xmit path to handle recursive > sends or forbid recursion entirely. Unfortunately none of these options > looks easy to achieve: > > * Making PPP xmit path reentrant will be hard and error prone because > of all the locking dependencies. Looks like simplifying PPP's > locking scheme will be required first. > > * I can't see any way to reliably prevent settings where a packet sent > on a given channel would be routed back to the parent unit. > > > OTOH, we can try to limit the impact of recursive sends for simple > cases: > > * Following your approach, either adapt the lower layers > (like l2tp_xmit_skb() for L2TP), or drop the packet when > cl.owner == smp_processor_id(). This is very limited in scope and > doesn't address issues like locking inversions. But it may let the > system survive long enough for the PPP to time out. > > * In the lower layer, check where the packet is going to be enqueued > and drop it if it's the parent device. That should reliably handle > simple and common cases. However this requires to update at least > L2TP and PPTP and to get a way to access the parent device. Also, > it doesn't prevent recursion with stacked interfaces. -- To unsubscribe from this list: send the line "unsubscribe linux-ppp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html