Re: [Gluster-devel] Query on healing process

Ravishankar N <ravishankar@xxxxxxxxxx> · Mon, 14 Mar 2016 13:37:43 +0530



    On 03/14/2016 10:36 AM, ABHISHEK
      PALIWAL wrote:

    
                Hi Ravishankar,

                  
                I just want to inform that this file have some different
                properties from other files like this is the file which
                having the fixed size and when there is no space in file
                the next data will start wrapping from the top of the
                file. 

                
              Means in this file we are doing the wrapping of the data
              as well. 

              
            So, I just want to know is this feature of file will effect
            gluster to identify the split-brain or xattr attributes?

          
    Hi,

    No it shouldn't matter at what offset the writes happen. The xattrs
    only track that the write was  missed  (and therefore a pending
    heal), irrespective of (offset, length).

    Ravi

    
          Regards,

        
        Abhishek

      
        On Fri, Mar 4, 2016 at 7:00 PM,
          ABHISHEK PALIWAL <abhishpaliwal@xxxxxxxxx>
          wrote:

          
                On Fri, Mar 4,
                    2016 at 6:36 PM, Ravishankar N <ravishankar@xxxxxxxxxx>
                    wrote:

                    
                          On 03/04/2016 06:23 PM, ABHISHEK PALIWAL
                            wrote:

                          
                                  Ok, just to confirm, glusterd  and
                                  other brick processes are running
                                  after this node rebooted?  

                                  When you run the above command, you
                                  need to check
                                  /var/log/glusterfs/glfsheal-volname.log
                                  logs errros. Setting client-log-level
                                  to DEBUG would give you a more verbose
                                  message

                                  
                            Yes, glusterd and other brick processes
                              running fine. I have check the 
                                /var/log/glusterfs/glfsheal-volname.log
                                file without the log-level= DEBUG. Here
                                is the logs from that file

                                
                                [2016-03-02 13:51:39.059440] I [MSGID:
                                101190]
                                [event-epoll.c:632:event_dispatch_epoll_worker]
                                0-epoll: Started thread with index 1

                                [2016-03-02 13:51:39.072172] W [MSGID:
                                101012]
                                [common-utils.c:2776:gf_get_reserved_ports]
                                0-glusterfs: could not open the file
                                /proc/sys/net/ipv4/ip_local_reserved_ports
                                for getting reserved ports info [No such
                                file or directory]

                                [2016-03-02 13:51:39.072228] W [MSGID:
                                101081]
                                [common-utils.c:2810:gf_process_reserved_ports]
                                0-glusterfs: Not able to get reserved
                                ports, hence there is a possibility that
                                glusterfs may consume reserved port

                                [2016-03-02 13:51:39.072583] E
                                [socket.c:2278:socket_connect_finish]
                                0-gfapi: connection to 127.0.0.1:24007
                                failed (Connection refused)

                              
                        Not sure why ^^ occurs.
                          You could try flushing iptables (iptables -F),
                          restart glusterd and run the heal info command
                          again .

                        
                  No hint from the logs? I'll try your suggestion.
                    

                            [2016-03-02
                                13:51:39.072663] E [MSGID: 104024]
                                [glfs-mgmt.c:738:mgmt_rpc_notify]
                                0-glfs-mgmt: failed to connect with
                                remote-host: localhost (Transport
                                endpoint is not connected) [Transport
                                endpoint is not connected]

                                [2016-03-02 13:51:39.072700] I [MSGID:
                                104025]
                                [glfs-mgmt.c:744:mgmt_rpc_notify]
                                0-glfs-mgmt: Exhausted all volfile
                                servers [Transport endpoint is not
                                connected]

                               
                                      # gluster
                                                    volume heal
                                                    c_glusterfs info
                                                    split-brain 

                                                  c_glusterfs:

                                                    Not able to fetch
                                                    volfile from
                                                    glusterd 

                                                  Volume heal
                                                    failed.
                                    
                                  
                                      And

                                                based on the your
                                                observation I understood
                                                that this is not the
                                                problem of split-brain
                                                but is there any way
                                                  through which can find
                                                  out the file which is
                                                  not in split-brain as
                                                  well as not in sync?

                                              
                                            `gluster volume heal
                                            c_glusterfs info
                                            split-brain`  should give
                                            you files that need heal.

                                          
                          Sorry  I meant 'gluster
                                    volume heal c_glusterfs info' should
                                    give you the files that need heal
                                    and 'gluster
                                    volume heal c_glusterfs info
                                    split-brain' the list of files in
                                    split-brain.

                                    The commands are detailed in
                                    https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md

                                  
                  Yes, I have tried this as well It is also giving
                    Number of entries : 0 means no healing is required
                    but the file /opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
                            is not in sync both of brick showing the
                            different version of this file.

                            
                  You can see it in the
                            getfattr command outcome as well.

                           
                      # getfattr -m
                                . -d -e hex
                                /opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
                              

                              getfattr:
                                Removing leading '/' from absolute path
                                names 

                              # file:
                                opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
                              

                              trusted.afr.c_glusterfs-client-0=0x000000000000000000000000
                              

                              trusted.afr.c_glusterfs-client-2=0x000000000000000000000000
                              

                              trusted.afr.c_glusterfs-client-4=0x000000000000000000000000
                              

                              trusted.afr.c_glusterfs-client-6=0x000000000000000000000000
                              

                              trusted.afr.c_glusterfs-client-8=0x000000060000000000000000
                                //because client8 is the latest client
                                in our case and starting 8 digits 

                      
                      00000006....are
                                            saying like there is
                                            something in changelog data.

                                          
                      trusted.afr.dirty=0x000000000000000000000000
                              

                              trusted.bit-rot.version=0x000000000000001356d86c0c000217fd
                              

                              trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
                              

                              # lhsh
                                002500 getfattr -m . -d -e hex
                                /opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
                              

                              getfattr:
                                Removing leading '/' from absolute path
                                names 

                              # file:
                                opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
                              

                              trusted.afr.c_glusterfs-client-1=0x000000000000000000000000
                                // and here we can say that there is no
                                split brain but the file is out of sync

                              trusted.afr.dirty=0x000000000000000000000000
                              

                              trusted.bit-rot.version=0x000000000000001156d86c290005735c
                              

                              trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
                            

                     Regards,

                                
                     Abhishek 

                  
        -- 

        
            Regards

            Abhishek Paliwal

          
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users