Re: False notifications

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Fri, 23 May 2014 06:08:53 -0400 (EDT)

----- Original Message -----
> From: "Sahina Bose" <sabose@xxxxxxxxxx>
> To: "Milos Kozak" <milos.kozak@xxxxxxxxx>, "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
> Cc: "gluster-users" <gluster-users@xxxxxxxxxxx>
> Sent: Friday, May 23, 2014 3:23:53 PM
> Subject: Re:  False notifications
> 
> 
> On 05/20/2014 07:12 PM, Milos Kozak wrote:
> >
> > On 5/14/2014 1:43 AM, Sahina Bose wrote:
> >>
> >> On 05/14/2014 07:42 AM, Miloš Kozák wrote:
> >>> Hi,
> >>> I am running a field trial of Gluster 3.5 on two servers. These two
> >>> server use one 10k HDD each with XFS as a brick. On top of these
> >>> bricks I have one replica 2 volume:
> >>>
> >>> [root@nodef01i ~]# gluster volume info ph-fs-0
> >>>
> >>> Volume Name: ph-fs-0
> >>> Type: Replicate
> >>> Volume ID: 5085e018-7c47-4d4f-8dcb-cd89ec240393
> >>> Status: Started
> >>> Number of Bricks: 1 x 2 = 2
> >>> Transport-type: tcp
> >>> Bricks:
> >>> Brick1: 10.11.100.1:/gfs/s3-sata-10k/brick
> >>> Brick2: 10.11.100.2:/gfs/s3-sata-10k/brick
> >>> Options Reconfigured:
> >>> performance.io-thread-count: 12
> >>> network.ping-timeout: 2
> >>> performance.cache-max-file-size: 0
> >>> performance.flush-behind: on
> >>>
> >>> Additionally I am running nagios to monitor everything where I use
> >>> http://exchange.nagios.org/directory/Plugins/System-Metrics/File-System/GlusterFS-checks/details.
> >>> I improved it slightly such that I monitor number of split-brain
> >>> files and all this information go to the performance data, therefore
> >>> I can draw pictures out of it (these pictures are in attachement).
> >>>
> >>> My problem is that I am receiving quite a lot of false warning from
> >>> nagios during a day because there are some unsync files (gluster
> >>> volume heal XXX info). I dont know if it is a bug or it is cause by
> >>> my configuration. Either way it is quite disturbing and I am afraid
> >>> that after receiving a lot false warning I could just omit an
> >>> important one..
> >>
> >>
> >> I think the issue is because the "gluster volume heal info" also
> >> reports files undergoing I/O in addition to files that need
> >> self-heal. see
> >> http://supercolony.gluster.org/pipermail/gluster-users/2014-May/040239.html
> >> for more information on this. Pranith, please correct me if wrong.
> >>
> > It makes sense, but it is quiet inconvenient to check logs to be sure
> > what is actually I/O and what is healing.. So I support this
> > initiative! Do you have any idea when it is going to be implemented?
> 
> I
> 
> Pranith?

For 3.5.1 this is improved. Now "gluster volume heal <volname> info" can detect I/O vs self-heal for Writes/Truncates. There are still some corner cases for metadata (ownership/permissions etc), entry (create/unlink/rename/mkdir/rmdir/symlink/link) operations where it may show them in the output even though it is just I/O. But you should see significant improvement with this release.

Pranith

> 
> 
> >
> >> On another note, we are also developing Nagios plugins that can be
> >> used to monitor the various entities and services in the gluster
> >> cluster. The repositories are here -
> >>
> >> gluster-nagios-addons -
> >> http://review.gluster.org/#/admin/projects/gluster-nagios-addons
> >> nagios-server-addons -
> >> http://review.gluster.org/#/admin/projects/nagios-server-addons
> >>
> > These projects also look very interesting. I was googling, but I didnt
> > find the way how to install addon to glusterfs. Can you please give me
> > a hint? I would like to install it, test it and maybe I can write some
> > patches..
> >
> >
> 
> Have pushed a patch with instructions - http://review.gluster.org/#/c/7846/.
> 
> Please check this out and let us know. We look forward to your
> contributions!
> 
> thanks
> sahina
> 
> 
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users