> A general note here. It does not appear that you implement the > error recovery states in your state machine. If the system fails > in the middle of doing an IPMI operation, it is likely to fail. The reason why I din't implement the error handling is that I think the error rate is low and it may take many seconds (but I don't have any statistical data, that's my anticipation). The most important thing is to start booting the 2nd kernel surely and as soon as possible. For example, if a user uses a feature like fence_kdump and if the execution of fence_kdump gets delayed, the crashed host will be shot down by other host waiting for the notification from fence_kdump. Also, to keep the code simple is important for the reliability. Anyway, I'll rethink whether I can implement the error handling in simple logic or not. > If you do this you will need to detect and abort any running > operation. Implementing the full state machine is probably the > best approach, it should handle this, though it is rather complex. > > -corey Regards, -- Hidehiro Kawai Hitachi, Ltd. Research & Development Group