Re: [Gluster-devel] Query on healing process

ABHISHEK PALIWAL <abhishpaliwal@xxxxxxxxx> · Thu, 3 Mar 2016 11:14:55 +0530

Hi Ravi,

As I discussed earlier this issue, I investigated this issue and find that healing is not triggered because the "gluster volume heal c_glusterfs info split-brain" command not showing any entries as a outcome of this command even though the file in split brain case.

So, what I have done I manually deleted the gfid entry of that file from .glusterfs directory and follow the instruction mentioned in the following link to do heal

https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md

and this works fine for me.

But my question is why the split-brain command not showing any file in output.

Here I am attaching all the log which I get from the node for you and also the output of commands from both of the boards

In this tar file two directories are present 

000300 - log for the board which is running continuously
002500-  log for the board which is rebooted 

I am waiting for your reply please help me out on this issue.

Thanks in advanced.

Regards,
Abhishek

On Fri, Feb 26, 2016 at 1:21 PM, ABHISHEK PALIWAL <abhishpaliwal@xxxxxxxxx> wrote:
On Fri, Feb 26, 2016 at 10:28 AM, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:

    On 02/26/2016 10:10 AM, ABHISHEK
      PALIWAL wrote:

      Yes correct

    Okay, so when you say the files are not in sync until some time, are
    you getting stale data when accessing from the mount?

    I'm not able to figure out why heal info shows zero when the files
    are not in sync, despite all IO happening from the mounts. Could you
    provide the output of getfattr -d -m . -e hex /brick/file-name from
    both bricks when you hit this issue?

I'll provide the logs once I get. here delay means we are powering on the second board after the 10 minutes.

      On Feb 26, 2016 9:57 AM, "Ravishankar N"
        <ravishankar@xxxxxxxxxx>
        wrote:

            Hello,

              On 02/26/2016 08:29 AM, ABHISHEK PALIWAL wrote:

                            Hi Ravi,

                            Thanks for the response.

                          We are using Glugsterfs-3.7.8

                          Here is the use case:

                          We have a
                            logging file which saves logs of the events
                            for every board of a node and these files
                            are in sync using glusterfs. System in
                            replica 2 mode it means When one
                              brick in a replicated volume goes offline,
                              the glusterd daemons on the other nodes
                              keep track of all the files that are not
                              replicated to the offline brick. When the
                              offline brick becomes available again, the
                              cluster initiates a healing process,
                              replicating the updated files to that
                              brick. But in our casse, we see
                            that log file of one board is not in the
                            sync and its format is corrupted means files
                            are not in sync.

            Just to understand you correctly, you have mounted the 2
            node replica-2 volume on both these nodes and writing to a
            logging file from the mounts right? 

                        Even the outcome of #gluster volume
                            heal c_glusterfs info shows that there is no
                            pending heals.

                          Also , The logging
                            file which is updated is of fixed size and
                            the new entries will be wrapped ,overwriting
                            the old entries.

                            This way we have seen that after few
                            restarts , the contents of the same file on
                            two bricks are different , but the volume
                            heal info shows zero entries

                      Solution:

                    But when we tried to put delay 
                            > 5 min before the healing
                        everything is working fine.

                  Regards,

                Abhishek

                On Fri, Feb 26, 2016 at 6:35
                  AM, Ravishankar N <ravishankar@xxxxxxxxxx>
                  wrote:

                        On 02/25/2016 06:01 PM, ABHISHEK PALIWAL
                          wrote:

                                  Hi,

                                  Here, I have one query regarding the
                                  time taken by the healing process.

                                In current two node setup when we
                                rebooted one node then the self-healing
                                process starts less than 5min interval
                                on the board which resulting the
                                corruption of the some files data.

                       Heal should start immediately after the
                      brick process comes up. What version of gluster
                      are you using? What do you mean by corruption of
                      data? Also, how did you observe that the heal
                      started after 5 minutes?

                      -Ravi

                              And to resolve it I have search on google
                              and found the following link:

                              https://support.rackspace.com/how-to/glusterfs-troubleshooting/

                            Mentioning that the healing process can
                              takes upto 10min of time to start this
                              process.

                            Here is the statement from the link:

                              "Healing replicated volumes 

                              When any brick in a replicated volume goes
                              offline, the glusterd daemons on the
                              remaining nodes keep track of all the
                              files that are not replicated to the
                              offline brick. When the offline brick
                              becomes available again, the cluster
                              initiates a healing process, replicating
                              the updated files to that brick. The
                                start of this process can take up to 10
                                minutes, based on observation." 

                            After giving the time of more than 5
                              min file corruption problem has been
                              resolved.

                            So, Here my question is there any way
                              through which we can reduce the time taken
                              by the healing process to start?

                            Regards,

                            Abhishek Paliwal

                        _______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

                -- 

                    Regards

                    Abhishek Paliwal

-- 

Regards

Abhishek Paliwal

-- 

Regards

Abhishek Paliwal

Attachment:
HU37300_rep.tar.gz

Description: GNU Zip compressed data
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users