Re: [Gluster-devel] Query on healing process

ABHISHEK PALIWAL <abhishpaliwal@xxxxxxxxx> · Fri, 4 Mar 2016 19:00:03 +0530

On Fri, Mar 4, 2016 at 6:36 PM, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:

    On 03/04/2016 06:23 PM, ABHISHEK
      PALIWAL wrote:

            Ok, just to confirm, glusterd  and other brick processes are
            running after this node rebooted?  

            When you run the above command, you need to check
            /var/log/glusterfs/glfsheal-volname.log logs errros. Setting
            client-log-level to DEBUG would give you a more verbose
            message

      Yes, glusterd and other brick processes running fine. I have
        check the 
          /var/log/glusterfs/glfsheal-volname.log file without the
          log-level= DEBUG. Here is the logs from that file

          [2016-03-02 13:51:39.059440] I [MSGID: 101190]
          [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll:
          Started thread with index 1

          [2016-03-02 13:51:39.072172] W [MSGID: 101012]
          [common-utils.c:2776:gf_get_reserved_ports] 0-glusterfs: could
          not open the file /proc/sys/net/ipv4/ip_local_reserved_ports
          for getting reserved ports info [No such file or directory]

          [2016-03-02 13:51:39.072228] W [MSGID: 101081]
          [common-utils.c:2810:gf_process_reserved_ports] 0-glusterfs:
          Not able to get reserved ports, hence there is a possibility
          that glusterfs may consume reserved port

          [2016-03-02 13:51:39.072583] E
          [socket.c:2278:socket_connect_finish] 0-gfapi: connection to 127.0.0.1:24007
          failed (Connection refused)

    Not sure why ^^ occurs. You could try flushing
      iptables (iptables -F), restart glusterd and run the heal info
      command again .

No hint from the logs? I'll try your suggestion. 

      [2016-03-02 13:51:39.072663] E [MSGID: 104024]
          [glfs-mgmt.c:738:mgmt_rpc_notify] 0-glfs-mgmt: failed to
          connect with remote-host: localhost (Transport endpoint is not
          connected) [Transport endpoint is not connected]

          [2016-03-02 13:51:39.072700] I [MSGID: 104025]
          [glfs-mgmt.c:744:mgmt_rpc_notify] 0-glfs-mgmt: Exhausted all
          volfile servers [Transport endpoint is not connected]

                # gluster
                              volume heal c_glusterfs info split-brain

                            c_glusterfs:
                              Not able to fetch volfile from glusterd

                            Volume heal
                              failed.

                And
                          based on the your observation I understood
                          that this is not the problem of split-brain
                          but is there any way through which can
                            find out the file which is not in
                            split-brain as well as not in sync?

                      `gluster volume heal c_glusterfs info
                      split-brain`  should give you files that need
                      heal.

      Sorry  I meant 'gluster volume heal
                c_glusterfs info' should give you the files that need
                heal and 'gluster volume heal
                c_glusterfs info split-brain' the list of files in split-brain.

                The commands are detailed in
https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md

Yes, I have tried this as well It is also giving Number of entries : 0 means no healing is required but the file /opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml is not in sync both of brick showing the different version of this file.

You can see it in the getfattr command outcome as well.

# getfattr -m . -d -e hex /opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml 

getfattr: Removing leading '/' from absolute path names 

# file: opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml 

trusted.afr.c_glusterfs-client-0=0x000000000000000000000000 

trusted.afr.c_glusterfs-client-2=0x000000000000000000000000 

trusted.afr.c_glusterfs-client-4=0x000000000000000000000000 

trusted.afr.c_glusterfs-client-6=0x000000000000000000000000 

trusted.afr.c_glusterfs-client-8=0x000000060000000000000000 //because client8 is the latest client in our case and starting 8 digits 
00000006....are saying like there is something in changelog data.
trusted.afr.dirty=0x000000000000000000000000 

trusted.bit-rot.version=0x000000000000001356d86c0c000217fd 

trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae 

# lhsh 002500 getfattr -m . -d -e hex /opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml 

getfattr: Removing leading '/' from absolute path names 

# file: opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml 

trusted.afr.c_glusterfs-client-1=0x000000000000000000000000 // and here we can say that there is no split brain but the file is out of sync

trusted.afr.dirty=0x000000000000000000000000 

trusted.bit-rot.version=0x000000000000001156d86c290005735c 

trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae 

                Regards,
   Abhishek 

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users