Re: False notifications

Joe Julian <joe@xxxxxxxxxxxxxxxx> · Tue, 13 May 2014 22:45:15 -0700



    On 5/13/2014 10:43 PM, Sahina Bose
      wrote:

    
      On 05/14/2014 07:42 AM, Miloš Kozák
        wrote:

      
      Hi, 

        I am running a field trial of Gluster 3.5 on two servers. These
        two server use one 10k HDD each with XFS as a brick. On top of
        these bricks I have one replica 2 volume: 

        
        [root@nodef01i ~]# gluster volume info ph-fs-0 

        
        Volume Name: ph-fs-0 

        Type: Replicate 

        Volume ID: 5085e018-7c47-4d4f-8dcb-cd89ec240393 

        Status: Started 

        Number of Bricks: 1 x 2 = 2 

        Transport-type: tcp 

        Bricks: 

        Brick1: 10.11.100.1:/gfs/s3-sata-10k/brick 

        Brick2: 10.11.100.2:/gfs/s3-sata-10k/brick 

        Options Reconfigured: 

        performance.io-thread-count: 12 

        network.ping-timeout: 2 

        performance.cache-max-file-size: 0 

        performance.flush-behind: on 

        
        Additionally I am running nagios to monitor everything where I
        use http://exchange.nagios.org/directory/Plugins/System-Metrics/File-System/GlusterFS-checks/details.
        I improved it slightly such that I monitor number of split-brain
        files and all this information go to the performance data,
        therefore I can draw pictures out of it (these pictures are in
        attachement). 

        
        My problem is that I am receiving quite a lot of false warning
        from nagios during a day because there are some unsync files
        (gluster volume heal XXX info). I dont know if it is a bug or it
        is cause by my configuration. Either way it is quite disturbing
        and I am afraid that after receiving a lot false warning I could
        just omit an important one.. 

      
      I think the issue is because the "gluster volume heal info" also
      reports files undergoing I/O in addition to files that need
      self-heal. see http://supercolony.gluster.org/pipermail/gluster-users/2014-May/040239.html
      for more information on this. Pranith, please correct me if wrong.

      
    That's what I've seen as well.

    
     On
      another note, we are also developing Nagios plugins that can be
      used to monitor the various entities and services in the gluster
      cluster. The repositories are here - 

      
      gluster-nagios-addons - http://review.gluster.org/#/admin/projects/gluster-nagios-addons

      nagios-server-addons - http://review.gluster.org/#/admin/projects/nagios-server-addons

      
      We will be putting together a short doc on these soon, meanwhile,
      please feel free to check it out and give us your valuable
      feedback.

      
        network.ping-timeout is set to 2, because I can not allow VM
        servers to hang for 2x42sec when other node is rebooted (we have
        some kind of reboot policy).. 

        
        Thanks for help, 

        Milos 

        
        _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users
      
      
      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users
    
    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users