Re: v3.6.1 vs v3.5.2 self heal - help (Nagios related)

Anuradha Talur <atalur@xxxxxxxxxx> · Fri, 21 Nov 2014 01:01:45 -0500 (EST)



----- Original Message -----
> From: "Vince Loschiavo" <vloschiavo@xxxxxxxxx>
> To: "gluster-users@xxxxxxxxxxx" <Gluster-users@xxxxxxxxxxx>
> Sent: Wednesday, November 19, 2014 9:50:50 PM
> Subject:  v3.6.1 vs v3.5.2 self heal - help (Nagios related)
> 
> 
> Hello Gluster Community,
> 
> I have been using the Nagios monitoring scripts, mentioned in the below
> thread, on 3.5.2 with great success. The most useful of these is the self
> heal.
> 
> However, I've just upgraded to 3.6.1 on the lab and the self heal daemon has
> become quite aggressive. I continually get alerts/warnings on 3.6.1 that
> virt disk images need self heal, then they clear. This is not the case on
> 3.5.2. This
> 
> Configuration:
> 2 node, 2 brick replicated volume with 2x1GB LAG network between the peers
> using this volume as a QEMU/KVM virt image store through the fuse mount on
> Centos 6.5.
> 
> Example:
> on 3.5.2:
> gluster volume heal volumename info: shows the bricks and number of entries
> to be healed: 0
> 
> On v3.5.2 - During normal gluster operations, I can run this command over and
> over again, 2-4 times per second, and it will always show 0 entries to be
> healed. I've used this as an indicator that the bricks are synchronized.
> 
> Last night, I upgraded to 3.6.1 in lab and I'm seeing different behavior.
> Running gluster volume heal volumename info , during normal operations, will
> show a file out-of-sync, seemingly between every block written to disk then
> synced to the peer. I can run the command over and over again, 2-4 times per
> second, and it will almost always show something out of sync. The individual
> files change, meaning:
> 
> Example:
> 1st Run: shows file1 out of sync
> 2nd run: shows file 2 and file 3 out of sync but file 1 is now in sync (not
> in the list)
> 3rd run: shows file 3 and file 4 out of sync but file 1 and 2 are in sync
> (not in the list).
> ...
> nth run: shows 0 files out of sync
> nth+1 run: shows file 3 and 12 out of sync.
> 
> From looking at the virtual machines running off this gluster volume, it's
> obvious that gluster is working well. However, this obviously plays havoc
> with Nagios and alerts. Nagios will run the heal info and get different and
> non-useful results each time, and will send alerts.
> 
> Is this behavior change (3.5.2 vs 3.6.1) expected? Is there a way to tune the
> settings or change the monitoring method to get better results into Nagios.
> 
In 3.6.1 the way heal info command works is different from that in 3.5.2. In 3.6.1, it is self-heal daemon that gathers the entries that might need healing. Currently, in 3.6.1, there isn't a method to distinguish between a file that is being healed and a file with on-going I/O while listing. Hence you see files with normal operation too listed in the output of heal info command.
> Thank you,
> 
> --
> -Vince Loschiavo
> 
> 
> On Wed, Nov 19, 2014 at 4:35 AM, Humble Devassy Chirammal <
> humble.devassy@xxxxxxxxx > wrote:
> 
> 
> 
> Hi Gopu,
> 
> Awesome !!
> 
> We can have a Gluster blog about this implementation.
> 
> --Humble
> 
> 
> 
> --Humble
> 
> 
> On Wed, Nov 19, 2014 at 5:38 PM, Gopu Krishnan < gopukrishnantec@xxxxxxxxx >
> wrote:
> 
> 
> 
> Thanks for all your help... I was able to configure nagios using the
> glusterfs plugin. Following link shows how I configured it. Hope it helps
> someone else.:
> 
> http://gopukrish.wordpress.com/2014/11/16/monitor-glusterfs-using-nagios-plugin/
> 
> On Sun, Nov 16, 2014 at 11:44 AM, Humble Devassy Chirammal <
> humble.devassy@xxxxxxxxx > wrote:
> 
> 
> 
> Hi,
> 
> Please look at this thread
> http://gluster.org/pipermail/gluster-users.old/2014-June/017819.html
> 
> Btw, if you are around, we have a talk on same topic in upcoming GlusterFS
> India meetup.
> 
> Details can be fetched from:
> http://www.meetup.com/glusterfs-India/
> 
> --Humble
> 
> --Humble
> 
> 
> On Sun, Nov 16, 2014 at 11:23 AM, Gopu Krishnan < gopukrishnantec@xxxxxxxxx >
> wrote:
> 
> 
> 
> How can we monitor the glusters and alert us if something happened wrong. I
> found some nagios plugins and didn't work until this time. I am still
> experimenting with those. Any suggestions would be much helpful
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
> 
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-- 
Thanks,
Anuradha.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users