Re: False notifications

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,


On 5/14/2014 1:45 AM, Joe Julian wrote:

On 5/13/2014 10:43 PM, Sahina Bose wrote:

On 05/14/2014 07:42 AM, Miloš Kozák wrote:
Hi,
I am running a field trial of Gluster 3.5 on two servers. These two
server use one 10k HDD each with XFS as a brick. On top of these
bricks I have one replica 2 volume:

[root@nodef01i ~]# gluster volume info ph-fs-0

Volume Name: ph-fs-0
Type: Replicate
Volume ID: 5085e018-7c47-4d4f-8dcb-cd89ec240393
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.11.100.1:/gfs/s3-sata-10k/brick
Brick2: 10.11.100.2:/gfs/s3-sata-10k/brick
Options Reconfigured:
performance.io-thread-count: 12
network.ping-timeout: 2
performance.cache-max-file-size: 0
performance.flush-behind: on

Additionally I am running nagios to monitor everything where I use
http://exchange.nagios.org/directory/Plugins/System-Metrics/File-System/GlusterFS-checks/details.
I improved it slightly such that I monitor number of split-brain
files and all this information go to the performance data, therefore
I can draw pictures out of it (these pictures are in attachement).

My problem is that I am receiving quite a lot of false warning from
nagios during a day because there are some unsync files (gluster
volume heal XXX info). I dont know if it is a bug or it is cause by
my configuration. Either way it is quite disturbing and I am afraid
that after receiving a lot false warning I could just omit an
important one..


I think the issue is because the "gluster volume heal info" also
reports files undergoing I/O in addition to files that need self-heal.
see
http://supercolony.gluster.org/pipermail/gluster-users/2014-May/040239.html
for more information on this. Pranith, please correct me if wrong.


That's what I've seen as well.

On another note, we are also developing Nagios plugins that can be
used to monitor the various entities and services in the gluster
cluster. The repositories are here -

gluster-nagios-addons -
http://review.gluster.org/#/admin/projects/gluster-nagios-addons
nagios-server-addons -
http://review.gluster.org/#/admin/projects/nagios-server-addons

We will be putting together a short doc on these soon, meanwhile,
please feel free to check it out and give us your valuable feedback.


I walked your source codes through and I realized, according to my knowleadge for sure, that this is not real glusterfs addon it is 3rd party monitoring "daemon" or collection of scripts that monitor and inform nagios.. But you have the same problem with self-healing..

Basically this can be resolved only when Pranith fixes the output. In the meanwhile I am planning to write log parser, even if it is not greatest solution.. cause I need it.





network.ping-timeout is set to 2, because I can not allow VM servers
to hang for 2x42sec when other node is rebooted (we have some kind of
reboot policy)..

Thanks for help,
Milos






_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users



_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users



_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users





[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux