On 2015/5/20 21:22, Petr Mladek wrote: > On Tue 2015-05-19 14:57:46, Petr Mladek wrote: >> On Tue 2015-05-19 09:08:45, Wang Long wrote: >>> This is my backport patch series to Fix the problem(backport to 3.10): >>> " >>> When trigger_all_cpu_backtrace() is called on x86, it will trigger an >>> NMI on each CPU and call show_regs(). But this can lead to a hard lock >>> up if the NMI comes in on another printk(). >>> " >>> The solution is described in commit "a9edc88093287183ac934be44f295f183b2c62dd": >>> when the NMI triggers, it switches the printk routine for that CPU to call >>> a NMI safe printk function that records the printk in a per_cpu seq_buf >>> descriptor. After all NMIs have finished recording its data, the trace_ >>> seqs are printed in a safe context. >>> >>> The solution use "switch printk routine" and "seq_buf" infrastructures, but the >>> 3.10 stable have no both of them. >>> >>> The patch 1-13 backport the "seq_buf" infrastructures. in detail, patch 1, 2 >>> and 6 only backport "seq_buf" related code. >>> >>> The patch 14-15 backport the "switch printk routine". >>> >>> The patch 16-17 is the patch to print all cpu stacks from NMI safely >>> >>> as discussed in https://lkml.org/lkml/2015/5/13/497, in 3.10 stable, this is >>> the only way to solve the problem and the backport code is a bit more. >>> >>> v1 -> v2: >>> * fix the indent error. >>> * rebase on 3.10.79 >>> >>> Any thoughts? >> >> Please, wait with the integration. I am testing it with a storm of >> sysrq requests: >> >> $> while true ; do echo l >/proc/sysrq-trigger ; done >> >> with iptables enabled: >> >> $> iptables -A INPUT -j LOG --log-prefix "incomming packet:" >> >> and storm of pings from other machine: >> >> $> ping -f <patched-host> >> >> >> The machine somehow freezes. It does not make sense. I am trying to investigate. > > OK, it seems that the machine freezes because there are still few > messages printed in the NMI context, e.g.: > > [ 3080.286277] Uhhuh. NMI received for unknown reason 3d on CPU 12. > [ 3637.939276] Uhhuh. NMI received for unknown reason 2d on CPU 13. > > I am not exactly sure why I get them on the test machine. But I get > such messages from time to time when hammering it by the pings and > sysrq-l requests. > > I modified vprintk_emit() to do raw_spin_trylock(&logbuf_lock) > and do not try to lock console in NMI context. The trylock fails > from time to time but it does not longer freeze. > > I am going to clean up the vprintk_emit() modification and send it for > review. > > Anyway, this patch set seems to work as expected. It heavily reduces > the risk of NMI/printk-related deadlocks => it is worth having. > > Feel free to use the following for the whole patchset (backport): > > Reviewed-by: Petr Mladek <pmladek@xxxxxxx> > Tested-by: Petr Mladek <pmladek@xxxxxxx> Hi Greg, This patch set is the only way to solve the NMI/printk-related deadlock problems. Could you please include them to 3.10 stable? Although the code a bit more, most of the code is "seq_buf" infrastructures and it does not affect other parts of the kernel. Best Regards Wang Long > > > Best Regards, > Petr > > . > -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html