Nagios monitoring of replicated volumes..

chawkins at bplinux.com (Christopher Hawkins) · Fri, 03 Apr 2009 08:13:26 -0400 (EDT)

I wrote a script to do something similar. Here's a modified version that will verify working glusterfs mounts in general... All you need is a path that is the same on all nodes being checked and on the node performing the check, and passwordless ssh into the gluster client nodes. For testing I just made a tmp directory right inside the glusterfs mount and used that: 

#!/bin/bash 

check_node() { 
??# ssh into the node and have it write its hostname into a temp file 
??# in a gluster mounted directory. If we can read it from here and it's 
??# correct, the node is online with 100% certainty 
??SSH="ssh -q -l root -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o ConnectTimeout=5" 

??# All nodes must have the same path to this directory 
??TEMP_DIR=/cluster/tmp 
??FILE=`mktemp -p $TEMP_DIR` 
??$SSH $ip "hostname > $FILE" 

??# For any ip addresses listed in /a_node_ip_list (list them one per line) 
??# on line 27, we need to get the hostname 
??# from /etc/hosts. Make sure it's in there 
??if test "`grep $ip /etc/hosts | awk '{print $2}'`" == "`cat $FILE`" 
?? then 
?? ?echo "confirmed online" 
?? else 
?? ?echo "not online. Call someone!" 
?? fi 
?} 

echo 
echo "GlusterFS status:" 
echo 

for ip in `cat /a_node_ip_list` 
?do 
?? ? echo -n "checking $ip... ?" 
?? ? check_node 
?done 

# Clean up 
rm -rf $TEMP_DIR/tmp.* 

exit 0 
? 

> 
> This is an interesting topic indeed. 
> 
> I'm planning to have each server ping it's AFR pair, and if one of them 
> goes down, the moment it comes up, to run ls -lR on the mount. 
> 
> Perhaps others can share additional ideas? 
> 
> Regards. 
> 
> 2009/4/2 Cory Meyer < cory.meyer at gmail.com > 
> 
> > Has anyone found a decent way out there to monitor GlusterFS volumes? 
> > I'm currently using Nagios and Cacti to take care of basic CPU, Load, 
> > Memory, and raw Disk I/O. ? I need to monitor GlusterFS status and making 
> > sure all volumes are available.. 
> > 
> > My test environment is 6 servers with 6 AFR volumes which are each shared 
> > between those 2 servers. ?All volumes are mounted on each server. 
> > 
> > The checks I'm testing out so far include a simple Bash script that 
> > writes the current Unix timestamp and hostname to a file once a minute. 
> > This is done by each server on only the volumes that they store. 
> > ? ?echo "$(uname -n):$(date +%s)" > /mnt/gluster01/CHECK_FILE 
> > 
> > The Nagios NRPE daemon would then execute a Perl script on each of the 
> > clients. ? This script goes thorugh each of the Gluster mount points 
> > comparing the timestamps in the CHECK_FILE to the current system time 
> > alarming if the timestamp is off by more than a minute. ?Another test 
> > which hasn't been implimented was checking the contents of the CHECK_FILE 
> > ?with the data that is on the raw disk. 
> > 
> > Bash code to write timestamps and executed via cron once a minute. 
> > (write_timestamps.sh) 
> > http://glusterfs.pastebin.com/m5a220a6 
> > 
> > Perl code to compare the timestamps which is executed on the client. 
> > (check_glusterfs_mounts.pl) 
> > http://glusterfs.pastebin.com/m2f057a77 
> > 
> > Any ideas/questions/comments? 
> > 
> > 
> > 
> > _______________________________________________ 
> > Gluster-users mailing list 
> > Gluster-users at gluster.org 
> > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users 

_______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zresearch.com/pipermail/gluster-users/attachments/20090403/dcd5845d/attachment.htm>