Re: afr-self-heald.c:479:afr_shd_index_sweep

Paolo Margara <paolo.margara@xxxxxxxxx> · Thu, 29 Jun 2017 09:38:03 +0200



    Hi all,
    for the upgrade I followed this procedure:
    
      put node in maintenance mode (ensure no client are active)
      yum versionlock delete glusterfs*

      
      service glusterd stop
      yum update
      systemctl daemon-reload 

      
      service glusterd start
      yum versionlock add glusterfs*
      gluster volume heal vm-images-repo full
      gluster volume heal vm-images-repo info
    
    on each server every time I ran 'gluster --version' to confirm
      the upgrade, at the end I ran 'gluster volume set all
      cluster.op-version 30800'.
    Today I've tried to manually kill a brick process on a non
      critical volume, after that into the log I see:
    [2017-06-29 07:03:50.074388] I [MSGID: 100030]
      [glusterfsd.c:2454:main] 0-/usr/sbin/glusterfsd: Started running
      /usr/sbin/glusterfsd version 3.8.12 (args: /usr/sbin/glusterfsd -s
      virtnode-0-1-gluster --volfile-id
iso-images-repo.virtnode-0-1-gluster.data-glusterfs-brick1b-iso-images-repo
      -p
/var/lib/glusterd/vols/iso-images-repo/run/virtnode-0-1-gluster-data-glusterfs-brick1b-iso-images-repo.pid
      -S /var/run/gluster/c779852c21e2a91eaabbdda3b9127262.socket
      --brick-name /data/glusterfs/brick1b/iso-images-repo -l
      /var/log/glusterfs/bricks/data-glusterfs-brick1b-iso-images-repo.log
      --xlator-option
      *-posix.glusterd-uuid=e93ebee7-5d95-4100-a9df-4a3e60134b73
      --brick-port 49163 --xlator-option
      iso-images-repo-server.listen-port=49163)
    I've checked after the restart and indeed now the directory
      'entry-changes' is created, but why stopping the glusterd service
      has not stopped also the brick processes?
    Now how can I recover from this issue? Restarting all brick
      processes is enough?
    

    Greetings,
        Paolo Margara

    
    Il 28/06/2017 18:41, Pranith Kumar
      Karampuri ha scritto:

    
          On Wed, Jun 28, 2017 at 9:45 PM,
            Ravishankar N <ravishankar@xxxxxxxxxx>
            wrote:

            
                On 06/28/2017 06:52 PM, Paolo Margara
                  wrote:

                  
                    Hi list,

                    
                    yesterday I noted the following lines into the
                    glustershd.log log file:

                    
                    [2017-06-28 11:53:05.000890] W [MSGID: 108034]

                    [afr-self-heald.c:479:afr_shd_index_sweep]

                    0-iso-images-repo-replicate-0: unable to get
                    index-dir on

                    iso-images-repo-client-0

                    [2017-06-28 11:53:05.001146] W [MSGID: 108034]

                    [afr-self-heald.c:479:afr_shd_index_sweep]
                    0-vm-images-repo-replicate-0:

                    unable to get index-dir on vm-images-repo-client-0

                    [2017-06-28 11:53:06.001141] W [MSGID: 108034]

                    [afr-self-heald.c:479:afr_shd_index_sweep]
                    0-hosted-engine-replicate-0:

                    unable to get index-dir on hosted-engine-client-0

                    [2017-06-28 11:53:08.001094] W [MSGID: 108034]

                    [afr-self-heald.c:479:afr_shd_index_sweep]
                    0-vm-images-repo-replicate-2:

                    unable to get index-dir on vm-images-repo-client-6

                    [2017-06-28 11:53:08.001170] W [MSGID: 108034]

                    [afr-self-heald.c:479:afr_shd_index_sweep]
                    0-vm-images-repo-replicate-1:

                    unable to get index-dir on vm-images-repo-client-3

                    
                    Digging into the mailing list archive I've found
                    another user with a

                    similar issue (the thread was '
                    glustershd: unable to get

                    index-dir on myvolume-client-0'), the solution
                    suggested was to verify

                    if the  /<path-to-backend-brick>/.glusterfs/indices
                    directory contains

                    all these sub directories: 'dirty', 'entry-changes'
                    and 'xattrop' and if

                    some of them does not exists simply create it with
                    mkdir.

                    
                    In my case the 'entry-changes' directory is not
                    present on all the

                    bricks and on all the servers:

                    
                    /data/glusterfs/brick1a/hosted-engine/.glusterfs/indices/:

                    total 0

                    drw------- 2 root root 55 Jun 28 15:02 dirty

                    drw------- 2 root root 57 Jun 28 15:02 xattrop

                    
                    /data/glusterfs/brick1b/iso-images-repo/.glusterfs/indices/:

                    total 0

                    drw------- 2 root root 55 May 29 14:04 dirty

                    drw------- 2 root root 57 May 29 14:04 xattrop

                    
                    /data/glusterfs/brick2/vm-images-repo/.glusterfs/indices/:

                    total 0

                    drw------- 2 root root 112 Jun 28 15:02 dirty

                    drw------- 2 root root  66 Jun 28 15:02 xattrop

                    
                    /data/glusterfs/brick3/vm-images-repo/.glusterfs/indices/:

                    total 0

                    drw------- 2 root root 64 Jun 28 15:02 dirty

                    drw------- 2 root root 66 Jun 28 15:02 xattrop

                    
                    /data/glusterfs/brick4/vm-images-repo/.glusterfs/indices/:

                    total 0

                    drw------- 2 root root 112 Jun 28 15:02 dirty

                    drw------- 2 root root  66 Jun 28 15:02 xattrop

                    
                    I've recently upgraded gluster from 3.7.16 to 3.8.12
                    with the rolling

                    upgrade procedure and I haven't noted this issue
                    prior of the update, on

                    another system upgraded with the same procedure I
                    haven't encountered

                    this problem.

                    
                    Currently all VM images appear to be OK but prior to
                    create the

                    'entry-changes' I would like to ask if this is still
                    the correct

                    procedure to fix this issue

                  
              Did you restart the bricks after the upgrade? That should
              have created the entry-changes directory. Can you kill the
              brick and restart it and see if the dir is created? Double
              check from the brick logs that you're indeed running
              3.12:  "Started running /usr/local/sbin/glusterfsd version
              3.8.12" should appear when the brick starts.

            
            Please note that if you are going the route of killing
              and restarting, you need to do it in the same way you did
              rolling upgrade. You need to wait for heal to complete
              before you kill the other nodes. But before you do this,
              it is better you look at the logs or confirm the steps you
              used for doing upgrade.

            
              -Ravi
              
                
                      and if this problem could have affected the

                    heal operations occurred meanwhile.

                    
                    Thanks.

                    
                    Greetings,

                    
                         Paolo Margara

                    
                    _______________________________________________

                    Gluster-users mailing list

                    Gluster-users@xxxxxxxxxxx

                    http://lists.gluster.org/mailman/listinfo/gluster-users

                  
                  _______________________________________________

                  Gluster-users mailing list

                  Gluster-users@xxxxxxxxxxx

                  http://lists.gluster.org/mailman/listinfo/gluster-users

                
          -- 

          
            Pranith

            
    -- 
LABINF - HPC@POLITO
DAUIN - Politecnico di Torino
Corso Castelfidardo, 34D - 10129 Torino (TO)
phone: +39 011 090 7051
site: http://www.labinf.polito.it/
site: http://hpc.polito.it/
  

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users