RE: [RFC][PATCH v2 2/3] Hold multiple logs

Seiji Aguchi <seiji.aguchi@xxxxxxx> · Fri, 20 Jul 2012 00:39:24 +0000

Thank you for describing this in detail.

> Yes - if the OOPs is instrumental in the path leading to the hang/panic - then the OOPS is the first place to look for the root cause of
> the problem. But it will be a case by case analysis.
> Sometimes the OOPS might be unconnected. If possible we'd like to log more information to allow detective work to decide whether
> there is a connection. But as I mentioned above there are severe limits to how much better things are by storing more information.

I understand the reason why you think 3 or 4 logs are reasonable.
There are some cases  2nd or 3rd oops is critical....

I have some enterprise customers who are sensitive for a software failure  and specify panic_on_oops=1.
In this case, they don't need 3,4 logs. 2 logs  are enough.

So, kernel parameter should be as follows.

Log_num =1
  - For users who want to hold just one log.

Log_num=2
  - For users who can handle multiple logs and 1st oops is concerned. (by specifying panic_on_oops=1)

Log_num=3,4
 -  for users who care about 2nd or 3rd oops.

Log_num=5 or more
Invalid value.

If there is misunderstanding, please let me know.

Seiji

> -----Original Message-----
> From: Luck, Tony [mailto:tony.luck@xxxxxxxxx]
> Sent: Thursday, July 19, 2012 7:42 PM
> To: Seiji Aguchi; linux-doc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; mikew@xxxxxxxxxx; dzickus@xxxxxxxxxx; Matthew
> Garrett (mjg@xxxxxxxxxx)
> Cc: dle-develop@xxxxxxxxxxxxxxxxxxxxx; Satoru Moriya
> Subject: RE: [RFC][PATCH v2 2/3] Hold multiple logs
> 
> > If you are concerned about multiple OOPS case, I think an user app which logs from /dev/pstore to /var/log should be developed.
> 
> Agreed - we need an app/daemon to do this.
> 
> > Once it is developed, we don't need to care about multiple oops case and the appropriate number is two.
> 
> Only if you can guarantee that the app/daemon will run and save the first OOPS before the next occurs. Even if the system were
> running normally this might be difficult to achieve.. but in this case we know the system isn't running normally (it just OOPSed twice!).
> 
> However - there is progressively less value in collecting additional consecutive OOPS. Perhaps one is enough 90% or even 99% of the
> time. I'm naturally paranoid so having two or three would make me feel happy that most of the remaining 10% or 1% of the cases
> were covered.
> 
> > - In case where system is workable after oops.
> > The user app will erase an entry in NVRAM.
> > And we can get the message via /var/log.
> 
> Yes - the system can keep running after many types of OOPs - so the OOPS will be logged in /var/log (or by the app/daemon copying
> from pstore, or both).
> 
> > - In case where system hangs up or panics due to the oops.
> > Oops is the critical message and we don't need care about subsequent events.
> 
> Yes - if the OOPs is instrumental in the path leading to the hang/panic - then the OOPS is the first place to look for the root cause of
> the problem. But it will be a case by case analysis.
> Sometimes the OOPS might be unconnected. If possible we'd like to log more information to allow detective work to decide whether
> there is a connection. But as I mentioned above there are severe limits to how much better things are by storing more information.
> 
> -Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html