On Tue, Aug 13, 2013 at 08:13:56PM +0000, Luck, Tony wrote: > Generic tracepoints are architected to be able to fire at very high > rates and log huge amounts of information. So we'd need something > special to say just log these special tracepoints to network/serial. > > > Which reminds me, pstore could also be a good thing to use, in addition. > > Only put error info there as it is limited anyway. > > Yes - space is very limited. I don't know how to assign priority for logging > the dmesg data vs. some error logs. Didn't we say at some point, "log only the panic messsage which kills the machine"? However, we probably could use more the messages before that catastrophic event because they could give us hints about what lead to the panic but in that case maybe a limited pstore is the wrong logging medium. Actually, I can imagine the full serial/network logs of "special" tracepoints + dmesg to be the optimal thing. > If we just "printk()" the most important parts - then that data will > automatically flow to the serial console and to pstore. Actually, does the pstore act like a circular buffer? Because if it contains the last N relevant messages (for an arbitrary definition of relevant) before the system dies, then that could more helpful than only the error messages. And with the advent of UEFI, pretty much every system has a pstore. Too bad that we have to limit it to 50% of size so that the boxes don't brick. :-P > Then we have multiple paths for the critical bits of the error log > - and the tracepoints give us more details for the cases where the > machine doesn't spontaneously explode. Ok, let's sort: * First we have the not-so-critical hw error messages. We want to carry those out-of-band, i.e. not in dmesg so that people don't have to parse and collect dmesg but have a specialized solution which gives them structured logs and tools can analyze, collect and ... those errors. * When a critical error happens, the above usage is not necessarily advantageous anymore in the sense that, in order to debug what caused the machine to crash, we don't simply necessarily want only the crash message but also the whole system activity that lead to it. In which case, we probably actually want to turn off/ignore the error logging tracepoints and write *only* to dmesg which goes out over serial and to pstore. Right? Because in such cases I want to have *all* *relevant* messages that lead to the explosion + the explosion message itself. Makes sense? Yes, no? Aspects I've missed? Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html