Re: watchdog pet in kernel module

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 






On Thu, Dec 5, 2013 at 10:19 AM, Rajat Sharma <fs.rajat@xxxxxxxxx> wrote:
Although /dev/watchdog is available in usermode, but nothing should stop you to write to it from a kernel thread.

Rajat

I don't think /dev/watchdog (literally, I meant) is available in the kernel.   It is accessible in userspace, but translated to a different name in the kernel.   and moreover, if u access the variable directly, bypassing all the spinlock (see drivers/watchdog and look for "wdt_lock" spinlock) that is implemented around it, u might be going into a  racing condition.   

BUT.....if u really insist probing from inside the kernel....it is not watchdog, it is "process watch", in your own way.

ie, u can always write a loop that periodically probe the status of that specific to make sure it is in RUNNING state (vs BLOCKING when it is waiting for some I/O, or locks to complete), and perhaps check the CPU instruction to make sure that it is not going into a tight loop (ie, a userspace program that literally do "while(true) {do_nothing()}....and many other possible "hung" criteria for a process as well.   not easy...but extremely complex.
 


On Wed, Dec 4, 2013 at 5:50 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote:



On Thu, Dec 5, 2013 at 9:06 AM, Vipul Jain <vipulsj@xxxxxxxxx> wrote:



On Wed, Dec 4, 2013 at 4:57 PM, <Valdis.Kletnieks@xxxxxx> wrote:
On Wed, 04 Dec 2013 16:45:44 -0800, Vipul Jain said:

> If you don't mind can you please provide me more insight as what can be
> false alarm I can encounter to move pet inside kernel module?

The issue isn't false alarms - it's failure to alarm when it should.

The problem is that it's possible for a kernel to get wedged in such a way that
a kernel thread is still able to feed the watchdog timer on a regular basis,
but userspace is effectively hung and unable to proceed.  For example, if an
OOPS happens while a filesystem lock is held, all future userspace references
to that filesystem (and possibly all filesystems of the same type) will hang,
eventually strangling the box while the kernel is still perfectly able to keep
the watchdog working.

Hi Valdis,

I see what you are saying but what if the user process that's feeding the dog gets hung and rest of the system is fine then it will bring the whole system down won't it? I basically want to avoid this?


Normally the process that feed the dog, is a simple process that JUST periodically set the watchdog device descriptor.    Yes, one main() with a while loop just periodically resetting the descriptor.

And so it is is not able to respond in time, by inference, OTHER PROCESS must have hung.   In other system i saw there is a mother process that monitor a few (not all) of its key child process .... so perhaps one child will have one variable to signal to the mother that it is running.   If not responding in time, the mother will clean up everything and then purposely not setting the watchdog, resulting in reboot.  
 
Regards,
Vipul.




--
Regards,
Peter Teoh

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@xxxxxxxxxxxxxxxxx
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies





--
Regards,
Peter Teoh
_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@xxxxxxxxxxxxxxxxx
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

[Index of Archives]     [Newbies FAQ]     [Linux Kernel Mentors]     [Linux Kernel Development]     [IETF Annouce]     [Git]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux SCSI]     [Linux ACPI]
  Powered by Linux