Re: afr-self-heald.c:479:afr_shd_index_sweep

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Thu, 29 Jun 2017 19:57:06 +0530

On Thu, Jun 29, 2017 at 7:48 PM, Paolo Margara <paolo.margara@xxxxxxxxx> wrote:

    Hi Pranith,
    I'm using this guide
https://github.com/nixpanic/glusterdocs/blob/f6d48dc17f2cb6ee4680e372520ec3358641b2bc/Upgrade-Guide/upgrade_to_3.8.md
    Definitely my fault, but I think that is better to specify
      somewhere that restarting the service is not enough simply because
      in many other case, with other services, is sufficient.
The steps include the following command before installing 3.8 as per the page (https://github.com/nixpanic/glusterdocs/blob/f6d48dc17f2cb6ee4680e372520ec3358641b2bc/Upgrade-Guide/upgrade_to_3.8.md#online-upgrade-procedure-for-servers)
So I guess we have it covered?
Stop all gluster services using the below command or through your favorite way to stop them.
killall glusterfs glusterfsd glusterd

    Now I'm restarting every brick process (and waiting for the heal
      to complete), this is fixing my problem.
    Many thanks for the help.

    Greetings,
        Paolo

    Il 29/06/2017 13:03, Pranith Kumar
      Karampuri ha scritto:

        Paolo,

              Which document did you follow for the upgrade? We can fix
        the documentation if there are any issues.

        On Thu, Jun 29, 2017 at 2:07 PM,
          Ravishankar N <ravishankar@xxxxxxxxxx>
          wrote:

                On
                  06/29/2017 01:08 PM, Paolo Margara wrote:

                  Hi all,
                  for the upgrade I followed this procedure:

                    put node in maintenance mode (ensure no client
                      are active)
                    yum versionlock delete glusterfs*

                    service glusterd stop
                    yum update
                    systemctl daemon-reload 

                    service glusterd start
                    yum versionlock add glusterfs*
                    gluster volume heal vm-images-repo full
                    gluster volume heal vm-images-repo info

                  on each server every time I ran 'gluster --version'
                    to confirm the upgrade, at the end I ran 'gluster
                    volume set all cluster.op-version 30800'.
                  Today I've tried to manually kill a brick process
                    on a non critical volume, after that into the log I
                    see:
                  [2017-06-29 07:03:50.074388] I [MSGID: 100030]
                    [glusterfsd.c:2454:main] 0-/usr/sbin/glusterfsd:
                    Started running /usr/sbin/glusterfsd version 3.8.12
                    (args: /usr/sbin/glusterfsd -s virtnode-0-1-gluster
                    --volfile-id
                    iso-images-repo.virtnode-0-1-gluster.data-glusterfs-brick1b-iso-images-repo
                    -p
                    /var/lib/glusterd/vols/iso-images-repo/run/virtnode-0-1-gluster-data-glusterfs-brick1b-iso-images-repo.pid
                    -S /var/run/gluster/c779852c21e2a91eaabbdda3b9127262.socket
                    --brick-name /data/glusterfs/brick1b/iso-images-repo
                    -l /var/log/glusterfs/bricks/data-glusterfs-brick1b-iso-images-repo.log
                    --xlator-option *-posix.glusterd-uuid=e93ebee7-5d95-4100-a9df-4a3e60134b73
                    --brick-port 49163 --xlator-option
                    iso-images-repo-server.listen-port=49163)
                  I've checked after the restart and indeed now the
                    directory 'entry-changes' is created, but why
                    stopping the glusterd service has not stopped also
                    the brick processes?

               Just stopping,upgrading and restarting glusterd
              does not restart the brick processes, You would need to
              kill all gluster processes on the node before upgrading. 
              After upgrading, when you restart glusterd, it will
              automatically spawn the rest of the gluster processes on
              that node.

                  Now how can I recover from this issue? Restarting
                    all brick processes is enough?

               Yes, but ensure there are no pending heals like
              Pranith mentioned. https://gluster.readthedocs.io/en/latest/Upgrade-Guide/upgrade_to_3.7/ 
              lists the steps for upgrade to 3.7 but the steps mentioned
              there are similar for any rolling upgrade.

              -Ravi

                    Greetings,
                        Paolo Margara

                    Il
                      28/06/2017 18:41, Pranith Kumar Karampuri ha
                      scritto:

                          On Wed, Jun 28, 2017
                            at 9:45 PM, Ravishankar N <ravishankar@xxxxxxxxxx>
                            wrote:

                                On
                                  06/28/2017 06:52 PM, Paolo Margara
                                  wrote:

                                   Hi list,

                                    yesterday I noted the following
                                    lines into the glustershd.log log
                                    file:

                                    [2017-06-28 11:53:05.000890] W
                                    [MSGID: 108034]

                                    [afr-self-heald.c:479:afr_shd_index_sweep]

                                    0-iso-images-repo-replicate-0:
                                    unable to get index-dir on

                                    iso-images-repo-client-0

                                    [2017-06-28 11:53:05.001146] W
                                    [MSGID: 108034]

                                    [afr-self-heald.c:479:afr_shd_index_sweep]
                                    0-vm-images-repo-replicate-0:

                                    unable to get index-dir on
                                    vm-images-repo-client-0

                                    [2017-06-28 11:53:06.001141] W
                                    [MSGID: 108034]

                                    [afr-self-heald.c:479:afr_shd_index_sweep]
                                    0-hosted-engine-replicate-0:

                                    unable to get index-dir on
                                    hosted-engine-client-0

                                    [2017-06-28 11:53:08.001094] W
                                    [MSGID: 108034]

                                    [afr-self-heald.c:479:afr_shd_index_sweep]
                                    0-vm-images-repo-replicate-2:

                                    unable to get index-dir on
                                    vm-images-repo-client-6

                                    [2017-06-28 11:53:08.001170] W
                                    [MSGID: 108034]

                                    [afr-self-heald.c:479:afr_shd_index_sweep]
                                    0-vm-images-repo-replicate-1:

                                    unable to get index-dir on
                                    vm-images-repo-client-3

                                    Digging into the mailing list
                                    archive I've found another user with
                                    a

                                    similar issue (the thread was
                                    ' glustershd: unable
                                    to get

                                    index-dir on myvolume-client-0'),
                                    the solution suggested was to verify

                                    if the 
                                    /<path-to-backend-brick>/.glusterfs/indices
                                    directory contains

                                    all these sub directories: 'dirty',
                                    'entry-changes' and 'xattrop' and if

                                    some of them does not exists simply
                                    create it with mkdir.

                                    In my case the 'entry-changes'
                                    directory is not present on all the

                                    bricks and on all the servers:

                                    /data/glusterfs/brick1a/hosted-engine/.glusterfs/indices/:

                                    total 0

                                    drw------- 2 root root 55 Jun 28
                                    15:02 dirty

                                    drw------- 2 root root 57 Jun 28
                                    15:02 xattrop

                                    /data/glusterfs/brick1b/iso-images-repo/.glusterfs/indices/:

                                    total 0

                                    drw------- 2 root root 55 May 29
                                    14:04 dirty

                                    drw------- 2 root root 57 May 29
                                    14:04 xattrop

                                    /data/glusterfs/brick2/vm-images-repo/.glusterfs/indices/:

                                    total 0

                                    drw------- 2 root root 112 Jun 28
                                    15:02 dirty

                                    drw------- 2 root root  66 Jun 28
                                    15:02 xattrop

                                    /data/glusterfs/brick3/vm-images-repo/.glusterfs/indices/:

                                    total 0

                                    drw------- 2 root root 64 Jun 28
                                    15:02 dirty

                                    drw------- 2 root root 66 Jun 28
                                    15:02 xattrop

                                    /data/glusterfs/brick4/vm-images-repo/.glusterfs/indices/:

                                    total 0

                                    drw------- 2 root root 112 Jun 28
                                    15:02 dirty

                                    drw------- 2 root root  66 Jun 28
                                    15:02 xattrop

                                    I've recently upgraded gluster from
                                    3.7.16 to 3.8.12 with the rolling

                                    upgrade procedure and I haven't
                                    noted this issue prior of the
                                    update, on

                                    another system upgraded with the
                                    same procedure I haven't encountered

                                    this problem.

                                    Currently all VM images appear to be
                                    OK but prior to create the

                                    'entry-changes' I would like to ask
                                    if this is still the correct

                                    procedure to fix this issue

                              Did you restart the bricks after the
                              upgrade? That should have created the
                              entry-changes directory. Can you kill the
                              brick and restart it and see if the dir is
                              created? Double check from the brick logs
                              that you're indeed running 3.12:  "Started
                              running /usr/local/sbin/glusterfsd version
                              3.8.12" should appear when the brick
                              starts.

                            Please note that if you are going the
                              route of killing and restarting, you need
                              to do it in the same way you did rolling
                              upgrade. You need to wait for heal to
                              complete before you kill the other nodes.
                              But before you do this, it is better you
                              look at the logs or confirm the steps you
                              used for doing upgrade.

                              -Ravi

                                     and if
                                    this problem could have affected the

                                    heal operations occurred meanwhile.

                                    Thanks.

                                    Greetings,

                                         Paolo Margara

                                    _______________________________________________

                                    Gluster-users mailing list

                                    Gluster-users@xxxxxxxxxxx

                                    http://lists.gluster.org/mailman/listinfo/gluster-users

                                  _______________________________________________

                                  Gluster-users mailing list

                                  Gluster-users@xxxxxxxxxxx

                                  http://lists.gluster.org/mailman/listinfo/gluster-users

                          -- 

                            Pranith

        -- 

          Pranith

-- 
Pranith

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users