Re: v3.6.1 vs v3.5.2 self heal - help (Nagios related)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Vince,
It could be a behavioural change in heal process output capture with latest GlusterFS. If that is the case, we may tune the interval which  nagios collect heal info output  or some other settings to avoid continuous alerts. I am Ccing  gluster nagios devs.

--Humble 

--Humble


On Wed, Nov 19, 2014 at 9:50 PM, Vince Loschiavo <vloschiavo@xxxxxxxxx> wrote:

Hello Gluster Community,

I have been using the Nagios monitoring scripts, mentioned in the below thread, on 3.5.2 with great success. The most useful of these is the self heal.  

However, I've just upgraded to 3.6.1 on the lab and the self heal daemon has become quite aggressive.  I continually get alerts/warnings on 3.6.1 that virt disk images need self heal, then they clear.  This is not the case on 3.5.2.  This 

Configuration:
2 node, 2 brick replicated volume with 2x1GB LAG network between the peers using this volume as a QEMU/KVM virt image store through the fuse mount on Centos 6.5.

Example:
on 3.5.2:
gluster volume heal volumename info:  shows the bricks and number of entries to be healed: 0

On v3.5.2 - During normal gluster operations, I can run this command over and over again, 2-4 times per second, and it will always show 0 entries to be healed.  I've used this as an indicator that the bricks are synchronized.  

Last night, I upgraded to 3.6.1 in lab and I'm seeing different behavior.
Running gluster volume heal volumename info, during normal operations, will show a file out-of-sync, seemingly between every block written to disk then synced to the peer.  I can run the command over and over again, 2-4 times per second, and it will almost always show something out of sync.  The individual files change, meaning:

Example:
1st Run: shows file1 out of sync
2nd run: shows file 2 and file 3 out of sync but file 1 is now in sync (not in the list)
3rd run: shows file 3 and file 4 out of sync but file 1 and 2 are in sync (not in the list).
...
nth run: shows 0 files out of sync
nth+1 run: shows file 3 and 12 out of sync. 

From looking at the virtual machines running off this gluster volume, it's obvious that gluster is working well.  However, this obviously plays havoc with Nagios and alerts.  Nagios will run the heal info and get different and non-useful results each time, and will send alerts.

Is this behavior change (3.5.2 vs 3.6.1) expected?  Is there a way to tune the settings or change the monitoring method to get better results into Nagios.

Thank you,

-- 
-Vince Loschiavo


On Wed, Nov 19, 2014 at 4:35 AM, Humble Devassy Chirammal <humble.devassy@xxxxxxxxx> wrote:
Hi Gopu,

Awesome !!

We can  have a Gluster blog about this implementation.

--Humble



--Humble


On Wed, Nov 19, 2014 at 5:38 PM, Gopu Krishnan <gopukrishnantec@xxxxxxxxx> wrote:
Thanks for all your help... I was able to configure nagios using the glusterfs plugin. Following link shows how I configured it. Hope it helps someone else.:

http://gopukrish.wordpress.com/2014/11/16/monitor-glusterfs-using-nagios-plugin/

On Sun, Nov 16, 2014 at 11:44 AM, Humble Devassy Chirammal <humble.devassy@xxxxxxxxx> wrote:
Btw,  if you are around, we have a talk on same topic in upcoming  GlusterFS India meetup.

Details can be fetched from:
 http://www.meetup.com/glusterfs-India/

--Humble

--Humble


On Sun, Nov 16, 2014 at 11:23 AM, Gopu Krishnan <gopukrishnantec@xxxxxxxxx> wrote:
How can we monitor the glusters and alert us if something happened wrong. I found some nagios plugins and didn't work until this time. I am still experimenting with those. Any suggestions would be much helpful

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users




_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users





_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux