Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

Ravishankar N <ravishankar@xxxxxxxxxx> · Fri, 21 Jul 2017 16:40:40 +0530



    On 07/21/2017 02:55 PM, yayo (j) wrote:

    
          2017-07-20 14:48 GMT+02:00
            Ravishankar N <ravishankar@xxxxxxxxxx>:

            
                 But it does  say something. All these gfids of
                completed heals in the log below are the for the ones
                that you have given the getfattr output of. So what is
                likely happening is there is an intermittent connection
                problem between your mount and the brick process,
                leading to pending heals again after the heal gets
                completed, which is why the numbers are varying each
                time. You would need to check why that is the case.

                Hope this helps,

                Ravi
                
                  
                                [2017-07-20 09:58:46.573079] I
                                    [MSGID: 108026]
                                    [afr-self-heal-common.c:1254:afr_log_selfheal]
                                    0-engine-replicate-0: Completed data
                                    selfheal on
                                    e6dfd556-340b-4b76-b47b-7b6f5bd74327.
                                    sources=[0] 1  sinks=2
                              
                            
                                [2017-07-20 09:59:22.995003] I
                                    [MSGID: 108026]
                                    [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
                                    0-engine-replicate-0: performing
                                    metadata selfheal on
                                    f05b9742-2771-484a-85fc-5b6974bcef81
                              
                            
                                [2017-07-20 09:59:22.999372] I
                                    [MSGID: 108026]
                                    [afr-self-heal-common.c:1254:afr_log_selfheal]
                                    0-engine-replicate-0: Completed
                                    metadata selfheal on
                                    f05b9742-2771-484a-85fc-5b6974bcef81.
                                    sources=[0] 1  sinks=2
                              
                            
          Hi,
          

          But
            we ha1e 2 gluster volume on the same network and the other
            one (the "Data" gluster) don't have any problems. Why you
            think there is a network problem?
        
      
    Because pending self-heals come into the picture when I/O from the
    clients (mounts) do not succeed on some bricks. They are mostly due
    to 

    (a) the client losing connection to some bricks (likely),

    (b) the I/O failing on the bricks themselves (unlikely).

    
    If most of the i/o is also going to the 3rd brick (since you say the
    files are already present on all bricks and I/O is successful) ,
    then it is likely to be (a).

    
            How to check this on a gluster infrastructure?
          

    In the fuse mount logs for the engine volume, check if there are any
    messages for brick disconnects. Something along the lines of
    "disconnected from volname-client-x".

    Just guessing here, but maybe even the 'data' volume did experience
    disconnects and self-heals later but you did not observe it when you
    ran heal info. See the glustershd log or mount log for for self-heal
    completion messages on  0-data-replicate-0 also.

    
    Regards,

    Ravi

    
          Thank
            you
          

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users