Re: afr-self-heald.c:479:afr_shd_index_sweep

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Thu, 29 Jun 2017 16:33:33 +0530

Paolo,
      Which document did you follow for the upgrade? We can fix the documentation if there are any issues.

On Thu, Jun 29, 2017 at 2:07 PM, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:

    On 06/29/2017 01:08 PM, Paolo Margara
      wrote:

      Hi all,
      for the upgrade I followed this procedure:

        put node in maintenance mode (ensure no client are active)
        yum versionlock delete glusterfs*

        service glusterd stop
        yum update
        systemctl daemon-reload 

        service glusterd start
        yum versionlock add glusterfs*
        gluster volume heal vm-images-repo full
        gluster volume heal vm-images-repo info

      on each server every time I ran 'gluster --version' to confirm
        the upgrade, at the end I ran 'gluster volume set all
        cluster.op-version 30800'.
      Today I've tried to manually kill a brick process on a non
        critical volume, after that into the log I see:
      [2017-06-29 07:03:50.074388] I [MSGID: 100030]
        [glusterfsd.c:2454:main] 0-/usr/sbin/glusterfsd: Started running
        /usr/sbin/glusterfsd version 3.8.12 (args: /usr/sbin/glusterfsd
        -s virtnode-0-1-gluster --volfile-id
iso-images-repo.virtnode-0-1-gluster.data-glusterfs-brick1b-iso-images-repo
        -p
/var/lib/glusterd/vols/iso-images-repo/run/virtnode-0-1-gluster-data-glusterfs-brick1b-iso-images-repo.pid
        -S /var/run/gluster/c779852c21e2a91eaabbdda3b9127262.socket
        --brick-name /data/glusterfs/brick1b/iso-images-repo -l
        /var/log/glusterfs/bricks/data-glusterfs-brick1b-iso-images-repo.log
        --xlator-option
        *-posix.glusterd-uuid=e93ebee7-5d95-4100-a9df-4a3e60134b73
        --brick-port 49163 --xlator-option
        iso-images-repo-server.listen-port=49163)
      I've checked after the restart and indeed now the directory
        'entry-changes' is created, but why stopping the glusterd
        service has not stopped also the brick processes?

    Just stopping,upgrading and restarting glusterd does not restart the
    brick processes, You would need to kill all gluster processes on the
    node before upgrading.  After upgrading, when you restart glusterd,
    it will automatically spawn the rest of the gluster processes on
    that node.

      Now how can I recover from this issue? Restarting all brick
        processes is enough?

    Yes, but ensure there are no pending heals like Pranith mentioned.
    https://gluster.readthedocs.io/en/latest/Upgrade-Guide/upgrade_to_3.7/ 
    lists the steps for upgrade to 3.7 but the steps mentioned there are
    similar for any rolling upgrade.

    -Ravi

      Greetings,
          Paolo Margara

      Il 28/06/2017 18:41, Pranith Kumar
        Karampuri ha scritto:

            On Wed, Jun 28, 2017 at 9:45 PM,
              Ravishankar N <ravishankar@xxxxxxxxxx>
              wrote:

                  On 06/28/2017 06:52 PM, Paolo Margara
                    wrote:

                      Hi list,

                      yesterday I noted the following lines into the
                      glustershd.log log file:

                      [2017-06-28 11:53:05.000890] W [MSGID: 108034]

                      [afr-self-heald.c:479:afr_shd_index_sweep]

                      0-iso-images-repo-replicate-0: unable to get
                      index-dir on

                      iso-images-repo-client-0

                      [2017-06-28 11:53:05.001146] W [MSGID: 108034]

                      [afr-self-heald.c:479:afr_shd_index_sweep]
                      0-vm-images-repo-replicate-0:

                      unable to get index-dir on vm-images-repo-client-0

                      [2017-06-28 11:53:06.001141] W [MSGID: 108034]

                      [afr-self-heald.c:479:afr_shd_index_sweep]
                      0-hosted-engine-replicate-0:

                      unable to get index-dir on hosted-engine-client-0

                      [2017-06-28 11:53:08.001094] W [MSGID: 108034]

                      [afr-self-heald.c:479:afr_shd_index_sweep]
                      0-vm-images-repo-replicate-2:

                      unable to get index-dir on vm-images-repo-client-6

                      [2017-06-28 11:53:08.001170] W [MSGID: 108034]

                      [afr-self-heald.c:479:afr_shd_index_sweep]
                      0-vm-images-repo-replicate-1:

                      unable to get index-dir on vm-images-repo-client-3

                      Digging into the mailing list archive I've found
                      another user with a

                      similar issue (the thread was '
                      glustershd: unable to get

                      index-dir on myvolume-client-0'), the solution
                      suggested was to verify

                      if the  /<path-to-backend-brick>/.glusterfs/indices
                      directory contains

                      all these sub directories: 'dirty',
                      'entry-changes' and 'xattrop' and if

                      some of them does not exists simply create it with
                      mkdir.

                      In my case the 'entry-changes' directory is not
                      present on all the

                      bricks and on all the servers:

                      /data/glusterfs/brick1a/hosted-engine/.glusterfs/indices/:

                      total 0

                      drw------- 2 root root 55 Jun 28 15:02 dirty

                      drw------- 2 root root 57 Jun 28 15:02 xattrop

                      /data/glusterfs/brick1b/iso-images-repo/.glusterfs/indices/:

                      total 0

                      drw------- 2 root root 55 May 29 14:04 dirty

                      drw------- 2 root root 57 May 29 14:04 xattrop

                      /data/glusterfs/brick2/vm-images-repo/.glusterfs/indices/:

                      total 0

                      drw------- 2 root root 112 Jun 28 15:02 dirty

                      drw------- 2 root root  66 Jun 28 15:02 xattrop

                      /data/glusterfs/brick3/vm-images-repo/.glusterfs/indices/:

                      total 0

                      drw------- 2 root root 64 Jun 28 15:02 dirty

                      drw------- 2 root root 66 Jun 28 15:02 xattrop

                      /data/glusterfs/brick4/vm-images-repo/.glusterfs/indices/:

                      total 0

                      drw------- 2 root root 112 Jun 28 15:02 dirty

                      drw------- 2 root root  66 Jun 28 15:02 xattrop

                      I've recently upgraded gluster from 3.7.16 to
                      3.8.12 with the rolling

                      upgrade procedure and I haven't noted this issue
                      prior of the update, on

                      another system upgraded with the same procedure I
                      haven't encountered

                      this problem.

                      Currently all VM images appear to be OK but prior
                      to create the

                      'entry-changes' I would like to ask if this is
                      still the correct

                      procedure to fix this issue

                Did you restart the bricks after the upgrade? That
                should have created the entry-changes directory. Can you
                kill the brick and restart it and see if the dir is
                created? Double check from the brick logs that you're
                indeed running 3.12:  "Started running
                /usr/local/sbin/glusterfsd version 3.8.12" should appear
                when the brick starts.

              Please note that if you are going the route of
                killing and restarting, you need to do it in the same
                way you did rolling upgrade. You need to wait for heal
                to complete before you kill the other nodes. But before
                you do this, it is better you look at the logs or
                confirm the steps you used for doing upgrade.

                -Ravi

                        and if this problem could have affected the

                      heal operations occurred meanwhile.

                      Thanks.

                      Greetings,

                           Paolo Margara

                      _______________________________________________

                      Gluster-users mailing list

                      Gluster-users@xxxxxxxxxxx

                      http://lists.gluster.org/mailman/listinfo/gluster-users

                    _______________________________________________

                    Gluster-users mailing list

                    Gluster-users@xxxxxxxxxxx

                    http://lists.gluster.org/mailman/listinfo/gluster-users

            -- 

              Pranith

      -- 
LABINF - HPC@POLITO
DAUIN - Politecnico di Torino
Corso Castelfidardo, 34D - 10129 Torino (TO)
phone: +39 011 090 7051
site: http://www.labinf.polito.it/
site: http://hpc.polito.it/

-- 
Pranith

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users