On (11/24/18 09:24), Tetsuo Handa wrote: > >> Steven told me on Plumbers conference that even few initial > >> characters saved him a day few times. > > > > Yes, and that has happened more than once. I would reboot and retest > > code that is crashing, and due to a triple fault, the machine would > > reboot because of some race, and the little output I get from the > > console would help tremendously. > > > > Remember, debugging the kernel is a lot like forensics, especially when > > it's from a customer's site. You look at all the evidence that you can > > get, and sometimes it's just 10 characters in the output that gives you > > an idea of where things went wrong. I'm really not liking the buffering > > idea because of this. > > Then, we should not enforce buffering in a way that requires modification of > printk() callers. That is, we should not ask printk() callers to use their > private buffer. What we can do is to enable/disable line buffering inside > printk() depending on the problem the user wants to debug. Right; overall I tend to agree with what you guys are saying and I like Petr's "I am more and more wondering if the buffered printk is worth the effort" comment; and I like Steven's comment on flushes; and admire Tetsuo's efforts. I think that printk_seq_buf/printk_buffer was never going to replace pr_cont() and I never liked the idea. The printk_safe proposal for lockdep had one OK thing about it - it would pass our normal marshaling before it would reach the buffering stage. Which means - no buffering for people who "detest" printk buffering. This looks better: printk->vprint_func->{early_printk/printk_safe/vprintk_emit}->buffering Than this: pr_buffer->buffering->vprintk_func->{early_printk/printk_safe/vprintk_emit} Another thing is - printk seq_buf/printk_buffer doesn't really solve any problem. People, who can use seq_buf/char buf[256]/etc. buffering, already can do so; people who cannot - won't switch to a new buffering printk anyway. The bad thing about printk_safe proposal is that it's per-CPU; which is OK for some paths (like lockdep), but not OK in general (e.g. OOM). IMO, try_buffered_printk() attempts to solve the problem at the right place - printk. And it does not break our normal marshaling, so we don't "fix" printk users and we keep people, who does vprintk_func->early_printk thing, happy. So I don't dislike try_buffered_printk() approach. And unlike before, now we are talking about a single line buffering. If we'd walk this way, I would prefer to NOT introduce any structs and any new code, or any new "split and log_store() in the middle" rules. Just a bunch of "struct cont" buffers: static struct cont conts[N]; and cont_add()/cont_flush() to handle pr_cont, with all the flushes it does; but on a per-context basis. conts[0] should serve as a fallback cont buffer, in case if there are no available cont buffers left. flush_on_panic() is still miserable, for sure; probably we can do something about it. Or... Instead. We can just leave pr_cont() alone for now. And make it possible to reconstruct messages - IOW, inject some info to printk messages. We do this at Samsung (inject CPU number at the beginning of every message. `cat serial.0 | grep "\[1\]"` to grep for all messages from CPU1). Probably this would be the simplest thing. > Also, we should allow disabling "struct cont" depending on the problem (in > order to allow flushing the 10 characters in the "cont" buffer). > > By the way, is the comment > > /* > * Continuation lines are buffered, and not committed to the record buffer > * until the line is complete, or a race forces it. The line fragments > * though, are printed immediately to the consoles to ensure everything has > * reached the console in case of a kernel crash. > */ printk does not do this anymore; you are right. > appropriate despite we don't call cont_flush() upon a kernel crash? I tend to count on flush_on_panic more than on a "last moment" pr_cont->cont_flush(), which was guaranteed to happen immediately only with early_con. A kernel crash usually has enough pr_emerg/printk-s to force cont flush. Even pr_info() will do. We look at the loglevel much later; so even messages which never make it to the consoles still flush cont buffer. -ss