Re: False notifications

Sahina Bose <sabose@xxxxxxxxxx> · Wed, 14 May 2014 11:13:23 +0530



    On 05/14/2014 07:42 AM, Miloš Kozák
      wrote:

    
    Hi,
      

      I am running a field trial of Gluster 3.5 on two servers. These
      two server use one 10k HDD each with XFS as a brick. On top of
      these bricks I have one replica 2 volume:
      

      [root@nodef01i ~]# gluster volume info ph-fs-0
      

      Volume Name: ph-fs-0
      

      Type: Replicate
      

      Volume ID: 5085e018-7c47-4d4f-8dcb-cd89ec240393
      

      Status: Started
      

      Number of Bricks: 1 x 2 = 2
      

      Transport-type: tcp
      

      Bricks:
      

      Brick1: 10.11.100.1:/gfs/s3-sata-10k/brick
      

      Brick2: 10.11.100.2:/gfs/s3-sata-10k/brick
      

      Options Reconfigured:
      

      performance.io-thread-count: 12
      

      network.ping-timeout: 2
      

      performance.cache-max-file-size: 0
      

      performance.flush-behind: on
      

      Additionally I am running nagios to monitor everything where I use
      http://exchange.nagios.org/directory/Plugins/System-Metrics/File-System/GlusterFS-checks/details.
      I improved it slightly such that I monitor number of split-brain
      files and all this information go to the performance data,
      therefore I can draw pictures out of it (these pictures are in
      attachement).
      

      My problem is that I am receiving quite a lot of false warning
      from nagios during a day because there are some unsync files
      (gluster volume heal XXX info). I dont know if it is a bug or it
      is cause by my configuration. Either way it is quite disturbing
      and I am afraid that after receiving a lot false warning I could
      just omit an important one..
      

    I think the issue is because the "gluster volume heal info" also
    reports files undergoing I/O in addition to files that need
    self-heal. see
    http://supercolony.gluster.org/pipermail/gluster-users/2014-May/040239.html
    for more information on this. Pranith, please correct me if wrong.

    
    On another note, we are also developing Nagios plugins that can be
    used to monitor the various entities and services in the gluster
    cluster. The repositories are here - 

    
    gluster-nagios-addons -
    http://review.gluster.org/#/admin/projects/gluster-nagios-addons

    nagios-server-addons -
    http://review.gluster.org/#/admin/projects/nagios-server-addons

    
    We will be putting together a short doc on these soon, meanwhile,
    please feel free to check it out and give us your valuable feedback.

    
      network.ping-timeout is set to 2, because I can not allow VM
      servers to hang for 2x42sec when other node is rebooted (we have
      some kind of reboot policy)..
      

      Thanks for help,
      

      Milos
      

      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users
    
    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users