Re: ipmi watchdog questions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 01, 2014 at 07:38:18PM -0500, Corey Minyard wrote:
> On 05/01/2014 08:58 AM, Don Zickus wrote:
> > Hi Corey,
> >
> > I stumbled upon an issue with a partner of ours, where they booted their
> > machine and tried to load the ipmi_watchdog module by hand and it failed.
> >
> > The reason it failed was that the iTCO watchdog driver was already loaded
> > and it registered the misc device /dev/watchdog first.
> >
> > I looked at the ipmi watchdog driver and realized it was never converted
> > to the new watchdog framework where the watchdog_core module manages the
> > '/dev/watchdog' misc device.
> >
> > So being naive and not knowing much about IPMI, I decided to follow the
> > helpful document Documentation/watchdog/convert_drivers_to_kernel_api.txt
> > and convert the ipmi_watchdog to use the new watchdog framework.
> >
> > I ran into a few issues and then realized the driver itself never really
> > binds to any hardware, so it makes the conversion process a little more
> > challenging.
> >
> > So a few questions to you before I waste my time in this area:
> >
> > - Is there any prior history about why the ipmi_watchdog was never
> >   converted to the new watchdog framework?  Lack of interest? Technical
> > hurdles?
> 
> Mostly lack of interest, but there are some technical hurdles.

Hi Corey,

Thanks for all the responses.

> 
> It would be hard to implement some things.  The watchdog framework has
> no concept of pretimeouts.  And IPMI is message based, you send a
> message to a controller to do anything, and you have to wait for the
> response.  That doesn't work very well with the watchdog interface,
> which assumes you can do everything immediately.

I will defer this conversation to Guenter's expertise.  I am willing to hack
up any suggestions the both of you come up with here to see if everything
works well (same goes for the fasync/poll stuff).

> 
> >
> > - Is there a reason why the ipmi_watchdog is a seperate module as opposed
> >   to being called by ipmi_si?  It seems there shouldn't be an issue with
> > the watchdog always loaded, it just won't do anything until someone opens
> > it (from my understanding).  Also you would gain the ability to use the
> > shutdown/remove routines properly instead of the reboot/panic notifiers.
> 
> I'm not sure I understand this.  Why would you want it as part of
> ipmi_si?  ipmi_msghandler would be a little more logical, but IMHO still
> doesn't make sense.  It uses the IPMI interface, and the interface is
> designed to have multiple users.  Better to keep it separate because
> it's a separate function.
> 
> I also don't understand the comment about shutdown/remove instead of
> reboot/panic.  Can you elaborate on that?

So part of the problem with ipmi_watchdog is that it can't load
automatically (like a normal driver).  The reason is that it doesn't
attach to any device.  My suggestion to roll it into ipmi_si (or
ipmi_msghandler works too), was to help with the autoloading part.

To counter-argue the argument that a customer may not want the watchdog
running, I would argue that it does nothing until someone opens it the
first time.  So autoloading shouldn't have a big downside to it.

The second part I was trying to change is to remove the panic/reboot
notifiers and instead use proper shutdown/remove functions.  This would
make it easier to use the 'struct watchdog_device' pointer as the
pointer could be embedded in the per device struct.  Though on the other
hand, there is only ever one ipmi device per system, so maybe having it
global isn't a big deal.  I was trying to think of scalability issues.

> 
> >   In addition, passing the pointer to the 'struct watchdog_device' would be
> > easier if some of those extra pieces were not there (as opposed to making
> > it a global reference).
> >
> > - What does the fasync and poll calls do for a watchdog?
> 
> The IPMI watchdog has the ability to report a pretimeout at a specific
> amount of time before the final timeout, presumably to take some action
> before the system reboots.  the fasync and poll (and read) calls let
> this be reported to the user.
> 
> >
> > I'll start with that for now.
> >
> > I appreciate any feedback.  Currently we just implemented blacklisting the
> > iTCO watchdog driver to workaround this problem.  I thought we could do
> > better, hence my motivation to do work in this area.
> 
> It would be nice, yes.  I'm afraid to get all the functionality would be
> a lot of hacking on the watchdog framework or removal of function from
> the driver.

We can take it step by step and see how we can get there.  Again my goal
was to help a partner of ours get the ipmi_watchdog to play nice in their
system without resorting to blacklisting other watchdogs.  In addition, I
thought it would help with out of the box configuration if the
ipmi_watchdog could autoload if the ipmi pieces were loaded in the system
too instead of having to add an entry to /etc/modprobe.d.

Thanks for the help so far!

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-watchdog" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux