On Mon, Sep 17, 2007 at 06:38:53PM -0700, Randy Dunlap wrote: > On Thu, 13 Sep 2007 07:21:10 -0600 Eric W. Biederman wrote: > > > Pete/Piet Delaney <pete at bluelane.com> writes: > > > > > Jason, Eric: > > > > > > Did you read Keith Owens suggestion on RAS tools from: > > > Yes. and I re-read it. > > There are several things in Keith's email that make sense: > > a. all RAS tools should use a common interface > b. it's not the kernel's job to decide which RAS tool runs first > > > Eric makes some good points too. I'm mostly similar to Eric: > paranoid about trusting software/hardware after a panic (or oops). > > So if someone wants to use multiple RAS tools on a panic event, > enabling an admin to set priorities is OK with me, but I'll only > trust the first one that is used, and even that one may have > problems. IOW, I don't see a big need to support multiple RAS > tools at one time. (speaking for myself) > I would be nice to have a kernel debugger co-exist with crash dumping. I like Eric's idea of debugger putting a break point on panic(). This would mean that rest of the post panic() actions have to be performed by second kernel which can perform those actions much more reliably. But this also brings in the additional requirement of passing all the required context to second kernel. For example, in the past somebody wanted to send a message to a remote node that sytem crashed so that standby can take over. If the same job has to be done in second kernel, it requires all the relavant information like remote host IP, port etc passed to the second kernel which I think makes the job little harder. May be one can pre-configure these parameters in user space and let the job be done either from initrd or user space scripts in second kernel. Thanks Vivek