Re: v3.6.1 vs v3.5.2 self heal - help (Nagios related)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/22/2014 10:12 PM, Vince Loschiavo wrote:
Thank you for that information.

Are there plans to restore the previous functionality in a later release of 3.6.x? Or is this what we should expect going forward?
Yes it will definitely be fixed. Wait for the next release. Things should be fine.

Pranith



On Thu, Nov 20, 2014 at 11:24 PM, Anuradha Talur <atalur@xxxxxxxxxx> wrote:


----- Original Message -----
> From: "Joe Julian" <joe@xxxxxxxxxxxxxxxx>
> To: "Anuradha Talur" <atalur@xxxxxxxxxx>, "Vince Loschiavo" <vloschiavo@xxxxxxxxx>
> Cc: "gluster-users@xxxxxxxxxxx" <Gluster-users@xxxxxxxxxxx>
> Sent: Friday, November 21, 2014 12:06:27 PM
> Subject: Re: v3.6.1 vs v3.5.2 self heal - help (Nagios related)
>
>
>
> On November 20, 2014 10:01:45 PM PST, Anuradha Talur <atalur@xxxxxxxxxx>
> wrote:
> >
> >
> >----- Original Message -----
> >> From: "Vince Loschiavo" <vloschiavo@xxxxxxxxx>
> >> To: "gluster-users@xxxxxxxxxxx" <Gluster-users@xxxxxxxxxxx>
> >> Sent: Wednesday, November 19, 2014 9:50:50 PM
> >> Subject: v3.6.1 vs v3.5.2 self heal - help (Nagios
> >related)
> >>
> >>
> >> Hello Gluster Community,
> >>
> >> I have been using the Nagios monitoring scripts, mentioned in the
> >below
> >> thread, on 3.5.2 with great success. The most useful of these is the
> >self
> >> heal.
> >>
> >> However, I've just upgraded to 3.6.1 on the lab and the self heal
> >daemon has
> >> become quite aggressive. I continually get alerts/warnings on 3.6.1
> >that
> >> virt disk images need self heal, then they clear. This is not the
> >case on
> >> 3.5.2. This
> >>
> >> Configuration:
> >> 2 node, 2 brick replicated volume with 2x1GB LAG network between the
> >peers
> >> using this volume as a QEMU/KVM virt image store through the fuse
> >mount on
> >> Centos 6.5.
> >>
> >> Example:
> >> on 3.5.2:
> >> gluster volume heal volumename info: shows the bricks and number of
> >entries
> >> to be healed: 0
> >>
> >> On v3.5.2 - During normal gluster operations, I can run this command
> >over and
> >> over again, 2-4 times per second, and it will always show 0 entries
> >to be
> >> healed. I've used this as an indicator that the bricks are
> >synchronized.
> >>
> >> Last night, I upgraded to 3.6.1 in lab and I'm seeing different
> >behavior.
> >> Running gluster volume heal volumename info , during normal
> >operations, will
> >> show a file out-of-sync, seemingly between every block written to
> >disk then
> >> synced to the peer. I can run the command over and over again, 2-4
> >times per
> >> second, and it will almost always show something out of sync. The
> >individual
> >> files change, meaning:
> >>
> >> Example:
> >> 1st Run: shows file1 out of sync
> >> 2nd run: shows file 2 and file 3 out of sync but file 1 is now in
> >sync (not
> >> in the list)
> >> 3rd run: shows file 3 and file 4 out of sync but file 1 and 2 are in
> >sync
> >> (not in the list).
> >> ...
> >> nth run: shows 0 files out of sync
> >> nth+1 run: shows file 3 and 12 out of sync.
> >>
> >> From looking at the virtual machines running off this gluster volume,
> >it's
> >> obvious that gluster is working well. However, this obviously plays
> >havoc
> >> with Nagios and alerts. Nagios will run the heal info and get
> >different and
> >> non-useful results each time, and will send alerts.
> >>
> >> Is this behavior change (3.5.2 vs 3.6.1) expected? Is there a way to
> >tune the
> >> settings or change the monitoring method to get better results into
> >Nagios.
> >>
> >In 3.6.1 the way heal info command works is different from that in
> >3.5.2. In 3.6.1, it is self-heal daemon that gathers the entries that
> >might need healing. Currently, in 3.6.1, there isn't a method to
> >distinguish between a file that is being healed and a file with
> >on-going I/O while listing. Hence you see files with normal operation
> >too listed in the output of heal info command.
>
> How did that regression pass?!
Test cases to check this condition was not written in regression tests.
>

--
Thanks,
Anuradha.



--
-Vince Loschiavo


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux