On Wed, 24 Jan 2018 00:21:30 +0900 Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx> wrote: > On (01/23/18 09:56), Steven Rostedt wrote: > [..] > > > Why do we even use irq_work for printk_safe? > > > > Why not? > > > > Really, I think you are trying to solve a symptom and not the problem. > > If we are having issues with irq_work, we are going to have issues with > > a work queue. It's just spreading out the problem instead of fixing it. > > I don't want to have heuristics in print_safe, I don't want to have a magic > number controlled by a user-space visible knob, I don't want to have the > first 3 lines of a lockdep splat. We can have more. But if printk is causing printks, that's a major bug. And work queues are not going to fix it, it will just spread out the pain. Have it be 100 printks, it needs to be fixed if it is happening. And having all printks just generate more printks is not helpful. Even if we slow them down. They will still never end. A printk causing a printk is a special case, and we need to just show enough to let the user know that its happening, and why printks are being throttled. Yes, we may lose data, but if every printk that goes out causes another printk, then there's going to be so much noise that we wont know what other things went wrong. Honestly, if someone showed me a report where the logs were filled with printks that caused printks, I'd stop right there and tell them that needs to be fixed before we do anything else. And if that recursion is happening because of another problem, I don't want to see the recursion printks. I want to see the printks that show what is causing the recursions. > The problem is - we flush printk_safe too soon and printing CPU ends up > in a lockup - it log_store()-s new messages while it's printing the pending No, the problem is that printks are causing more printks. Yes that will make flushing them soon more likely to lock up the system. But that's not the problem. The problem is printks causing printks. > ones. It's fine to do so when CPU is in preemptible context. Really, we > should not care in printk_safe as long as we don't lockup the kernel. The > misbehaving console must be fixed. If CPU is not in preemptible context then > we do lockup the kernel. Because we flush printk_safe regardless of the > current CPU context. If we will flush printk_safe via WQ then we automatically And if we can throttle recursive printks, then we should be able to stop that from happening. > add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, we > will not lockup it up." thing. Yes, we fill up the logbuf with probably needed > and appreciated or unneeded messages. But we should not care in printk_safe. > We don't lockup the kernel... And the misbehaving console must be fixed. I agree. > > I disagree with "If we are having issues with irq_work, we are going to have > issues with a work queue". There is a tremendous difference between irq_work > on that CPU and queue_work_on(smp_proessor_id()). One does not care about CPU > context, the other one does. But switching to work queue does not address the underlining problem that printks are causing more printks. -- Steve -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>