False notifications

Miloš Kozák <milos.kozak@xxxxxxxxx> · Tue, 13 May 2014 22:12:16 -0400

Hi,
I am running a field trial of Gluster 3.5 on two servers. These two 
server use one 10k HDD each with XFS as a brick. On top of these bricks 
I have one replica 2 volume:

[root@nodef01i ~]# gluster volume info ph-fs-0

Volume Name: ph-fs-0
Type: Replicate
Volume ID: 5085e018-7c47-4d4f-8dcb-cd89ec240393
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.11.100.1:/gfs/s3-sata-10k/brick
Brick2: 10.11.100.2:/gfs/s3-sata-10k/brick
Options Reconfigured:
performance.io-thread-count: 12
network.ping-timeout: 2
performance.cache-max-file-size: 0
performance.flush-behind: on

Additionally I am running nagios to monitor everything where I use 
http://exchange.nagios.org/directory/Plugins/System-Metrics/File-System/GlusterFS-checks/details. 
I improved it slightly such that I monitor number of split-brain files 
and all this information go to the performance data, therefore I can 
draw pictures out of it (these pictures are in attachement).

My problem is that I am receiving quite a lot of false warning from 
nagios during a day because there are some unsync files (gluster volume 
heal XXX info). I dont know if it is a bug or it is cause by my 
configuration. Either way it is quite disturbing and I am afraid that 
after receiving a lot false warning I could just omit an important one..

network.ping-timeout is set to 2, because I can not allow VM servers to 
hang for 2x42sec when other node is rebooted (we have some kind of 
reboot policy)..

Thanks for help,
Milos

Attachment:
nodef01i.czprg-GFS ph-fs-0.png

Description: PNG image
Attachment:
nodef01i.czprg-GFS ph-fs-0-unsync.png

Description: PNG image
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users