Michal Hocko wrote: > On Fri 15-09-17 21:09:29, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > On Fri 15-09-17 20:38:49, Tetsuo Handa wrote: > > > [...] > > > > You said "identify _why_ we see the lockup trigerring in the first > > > > place" without providing means to identify it. Unless you provide > > > > means to identify it (in a form which can be immediately and easily > > > > backported to 4.9 kernels; that is, backporting not-yet-accepted > > > > printk() offloading patchset is not a choice), this patch cannot be > > > > refused. > > > > > > I fail to see why. It simply workarounds an existing problem elsewhere > > > in the kernel without deeper understanding on where the problem is. You > > > can add your own instrumentation to debug and describe the problem. This > > > is no different to any other kernel bugs... > > > > Please do show us your patch for that. Normal users cannot afford developing > > such instrumentation to debug and describe the problem. > > Stop this nonsense already! Any kernel bug/lockup needs a debugging > which might be non-trivial and it is necessary to understand the real > culprit. We do not add random hacks to silence a problem. We aim at > fixing it! Assuming that Wang Yu's trace has RIP: 0010:[<...>] [<...>] dump_stack+0x.../0x... line in the omitted part (like Cong Wang's trace did), I suspect that a thread which is holding dump_lock is unable to leave console_unlock() from printk() for so long because many other threads are trying to call printk() from warn_alloc() while consuming all CPU time. Thus, not allowing other threads to consume CPU time / call printk() is a step for isolating it. If this problem still exists even if we made other threads sleep, the real cause will be somewhere else. But unfortunately Cong Wang has not yet succeeded with reproducing the problem. If Wang Yu is able to reproduce the problem, we can try setting 1 to /proc/sys/kernel/softlockup_all_cpu_backtrace so that we can know what other CPUs are doing. > > > > If our printk implementation is so weak it cannot cope with writers then > > > that should be fixed without spreading hacks in different subsystems. If > > > the lockup is a real problem under normal workloads (rather than > > > artificial ones) then we should try to throttle more aggresively. > > > > No throttle please. Throttling makes warn_alloc() more and more useless. > > so does try_lock approach... There is mutex_lock() approach, but you don't agree on using it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>