RFC: need better monitoring/failure signaling

landman at scalableinformatics.com (Joe Landman) · Mon, 13 Feb 2012 18:04:46 -0500

Hi folks

   Just had a failure this morning which didn't leave much in the way of 
useful logs ... a gluster process started running up CPU and ignoring 
input.  No log details, and a simple kill and restart fixed it.

   A few days ago, some compute node clients connected via Infiniband 
could see 5 of 6 bricks, though all the rest of the systems could see 
all 6.  Restarting the mount (umount -l /path ; sleep 30 ; mount /path) 
'fixed' it.

   The problem was that no one knew that there was a problem, the logs 
were (nearly) useless for problem determination.  We had to look at the 
overall system.

   What I'd like to request comments and thoughts on, are whether or not 
we can extract an external signal of some sort upon detection of an 
issue.  So in the event of a problem, an external program is run with an 
error number, some text, etc.  Sort of like what mdadm does for MD RAID 
units.  Alternatively, a nice simple monitoring port of some sort, which 
we can open, and read until EOF, which reports current (error) state, 
would be tremendously helpful.

   What we are looking for is basically a way to monitor the system. 
Not performance monitoring, but health monitoring.

   Yes, we can work on a hacked up version of this ... I've done 
something like this in the past.  What we want is to figure out how to 
expose enough of what we need to create a reasonable "health" monitor 
for bricks.

   I know there is a nagios plugin of some sort, and other similar 
tools.  What I am looking for is to get a discussion going on what the 
capability for this should be minimally composed of.  Given the layered 
nature of gluster, it might be harder to pass errors up and down through 
translator layers.  But if we could connect something to the logging 
system to specifically signal important events, to some place other than 
the log, and do so in real time (again, the mdadm model is perfect), 
then we are in good shape.  I don't know if this is showing up in 3.3, 
though this type of monitoring capability seems to be an obvious fit 
going forward.

   Unfortunately, this is something of a problem (monitoring gluster 
health), and I don't see easy answers save building a log parser at this 
time.  So something that lets us periodically inquire as to volume/brick 
health/availability would be (very) useful.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615