Re: Recovering Linux system from hung state via software

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 






On Wed, Dec 4, 2013 at 4:13 PM, Mandeep Sandhu <mandeepsandhu.chd@xxxxxxxxx> wrote:
> assuming one mother process is monitoring 10 child process, so inside each
> child process, simply just setup a PERIODIC (eg, per 5 sec) mechanism to
> toggle a binary variables through IPC means.   It will be reset when the
> mother process go around checking all the variable status and, if not reset
> it therefore implies that the particular process might be hung.    it can
> wait further, or continue checking other process.   at the end of checking
> ALL the process, if everything is OK, it should feed the kernel watchdog
> timer.   if the kernel watchdog timer is not reset, the kernel module will
> then reboot the system.   (ie, reboot is from kernel module).

Hold on! Why should we reboot the whole system if only some of these
processes are misbehaving?!?! Why should other processes suffer due
this? Wouldn't it be better to just kill the erroneous process (like
how most OS's anyway do, eg: "Force Quit" in Ubuntu, or chrome tabs).


In many COTS software, the behavior of every process is highly dependent on one-another, especially some of these will talk to hardware, and other are just processing the intermediate data.   When something goes wrong, it is difficult to diagnose the faults (which is why faults logging is important, and always done on flash or harddisk, but not temporary filesystem) in realtime (ie, self-diagnosis mechanism), so it is better to reboot.   yes, not all process need to trigger reboot, so design it with care.   eg, Apache server can always afford to be kill and restart a new one.   
 
Or are these processes the only ones running on the system?

-mandeep



--
Regards,
Peter Teoh
_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@xxxxxxxxxxxxxxxxx
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

[Index of Archives]     [Newbies FAQ]     [Linux Kernel Mentors]     [Linux Kernel Development]     [IETF Annouce]     [Git]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux SCSI]     [Linux ACPI]
  Powered by Linux