Re: afr-self-heald.c:479:afr_shd_index_sweep

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Thu, 29 Jun 2017 20:24:17 +0530

On Thu, Jun 29, 2017 at 8:12 PM, Paolo Margara <paolo.margara@xxxxxxxxx> wrote:

    Il 29/06/2017 16:27, Pranith Kumar Karampuri ha scritto:

          On Thu, Jun 29, 2017 at 7:48 PM,
            Paolo Margara <paolo.margara@xxxxxxxxx>
            wrote:

                Hi Pranith,
                I'm using this guide
                  https://github.com/nixpanic/glusterdocs/blob/f6d48dc17f2cb6ee4680e372520ec3358641b2bc/Upgrade-Guide/upgrade_to_3.8.md
                Definitely my fault, but I think that is better to
                  specify somewhere that restarting the service is not
                  enough simply because in many other case, with other
                  services, is sufficient.

            The steps include the following command before
              installing 3.8 as per the page (https://github.com/nixpanic/glusterdocs/blob/f6d48dc17f2cb6ee4680e372520ec3358641b2bc/Upgrade-Guide/upgrade_to_3.8.md#online-upgrade-procedure-for-servers)

            So I guess we have it covered?

    As I said it's my fault ;-)

Ah! sorry. Thanks for your mail!

                Stop all gluster services using the below command or
                  through your favorite way to stop them.
                killall glusterfs
                    glusterfsd glusterd

                Now I'm restarting every brick process (and waiting
                  for the heal to complete), this is fixing my problem.
                Many thanks for the help.

                Greetings,
                    Paolo

                    Il
                      29/06/2017 13:03, Pranith Kumar Karampuri ha
                      scritto:

                        Paolo,

                              Which document did you follow for the
                        upgrade? We can fix the documentation if there
                        are any issues.

                        On Thu, Jun 29, 2017 at
                          2:07 PM, Ravishankar N <ravishankar@xxxxxxxxxx>
                          wrote:

                                On
                                  06/29/2017 01:08 PM, Paolo Margara
                                  wrote:

                                  Hi all,
                                  for the upgrade I followed this
                                    procedure:

                                    put node in maintenance mode
                                      (ensure no client are active)
                                    yum versionlock delete
                                      glusterfs*

                                    service glusterd stop
                                    yum update
                                    systemctl daemon-reload 

                                    service glusterd start
                                    yum versionlock add glusterfs*
                                    gluster volume heal
                                      vm-images-repo full
                                    gluster volume heal
                                      vm-images-repo info

                                  on each server every time I ran
                                    'gluster --version' to confirm the
                                    upgrade, at the end I ran 'gluster
                                    volume set all cluster.op-version
                                    30800'.
                                  Today I've tried to manually kill a
                                    brick process on a non critical
                                    volume, after that into the log I
                                    see:
                                  [2017-06-29 07:03:50.074388] I
                                    [MSGID: 100030]
                                    [glusterfsd.c:2454:main]
                                    0-/usr/sbin/glusterfsd: Started
                                    running /usr/sbin/glusterfsd version
                                    3.8.12 (args: /usr/sbin/glusterfsd
                                    -s virtnode-0-1-gluster --volfile-id
                                    iso-images-repo.virtnode-0-1-gluster.data-glusterfs-brick1b-iso-images-repo
                                    -p /var/lib/glusterd/vols/iso-images-repo/run/virtnode-0-1-gluster-data-glusterfs-brick1b-iso-images-repo.pid
                                    -S /var/run/gluster/c779852c21e2a91eaabbdda3b9127262.socket
                                    --brick-name
                                    /data/glusterfs/brick1b/iso-images-repo
                                    -l /var/log/glusterfs/bricks/data-glusterfs-brick1b-iso-images-repo.log
                                    --xlator-option
                                    *-posix.glusterd-uuid=e93ebee7-5d95-4100-a9df-4a3e60134b73
                                    --brick-port 49163 --xlator-option
                                    iso-images-repo-server.listen-port=49163)
                                  I've checked after the restart and
                                    indeed now the directory
                                    'entry-changes' is created, but why
                                    stopping the glusterd service has
                                    not stopped also the brick
                                    processes?

                               Just stopping,upgrading and
                              restarting glusterd does not restart the
                              brick processes, You would need to kill
                              all gluster processes on the node before
                              upgrading.  After upgrading, when you
                              restart glusterd, it will automatically
                              spawn the rest of the gluster processes on
                              that node.

                                  Now how can I recover from this
                                    issue? Restarting all brick
                                    processes is enough?

                               Yes, but ensure there are no
                              pending heals like Pranith mentioned. https://gluster.readthedocs.io/en/latest/Upgrade-Guide/upgrade_to_3.7/ 
                              lists the steps for upgrade to 3.7 but the
                              steps mentioned there are similar for any
                              rolling upgrade.

                              -Ravi

                                    Greetings,
                                        Paolo Margara

                                    Il
                                      28/06/2017 18:41, Pranith Kumar
                                      Karampuri ha scritto:

                                          On
                                            Wed, Jun 28, 2017 at 9:45
                                            PM, Ravishankar N <ravishankar@xxxxxxxxxx>
                                            wrote:

                                                On
                                                  06/28/2017 06:52 PM,
                                                  Paolo Margara wrote:

                                                    Hi list,

                                                    yesterday I noted
                                                    the following lines
                                                    into the
                                                    glustershd.log log
                                                    file:

                                                    [2017-06-28
                                                    11:53:05.000890] W
                                                    [MSGID: 108034]

[afr-self-heald.c:479:afr_shd_index_sweep]

0-iso-images-repo-replicate-0: unable to get index-dir on

iso-images-repo-client-0

                                                    [2017-06-28
                                                    11:53:05.001146] W
                                                    [MSGID: 108034]

[afr-self-heald.c:479:afr_shd_index_sweep]
                                                    0-vm-images-repo-replicate-0:

                                                    unable to get
                                                    index-dir on
                                                    vm-images-repo-client-0

                                                    [2017-06-28
                                                    11:53:06.001141] W
                                                    [MSGID: 108034]

[afr-self-heald.c:479:afr_shd_index_sweep]
                                                    0-hosted-engine-replicate-0:

                                                    unable to get
                                                    index-dir on
                                                    hosted-engine-client-0

                                                    [2017-06-28
                                                    11:53:08.001094] W
                                                    [MSGID: 108034]

[afr-self-heald.c:479:afr_shd_index_sweep]
                                                    0-vm-images-repo-replicate-2:

                                                    unable to get
                                                    index-dir on
                                                    vm-images-repo-client-6

                                                    [2017-06-28
                                                    11:53:08.001170] W
                                                    [MSGID: 108034]

[afr-self-heald.c:479:afr_shd_index_sweep]
                                                    0-vm-images-repo-replicate-1:

                                                    unable to get
                                                    index-dir on
                                                    vm-images-repo-client-3

                                                    Digging into the
                                                    mailing list archive
                                                    I've found another
                                                    user with a

                                                    similar issue (the
                                                    thread was
                                                    '
                                                    glustershd: unable
                                                    to get

                                                    index-dir on
                                                    myvolume-client-0'),
                                                    the solution
                                                    suggested was to
                                                    verify

                                                    if the 
                                                    /<path-to-backend-brick>/.glusterfs/indices
                                                    directory contains

                                                    all these sub
                                                    directories:
                                                    'dirty',
                                                    'entry-changes' and
                                                    'xattrop' and if

                                                    some of them does
                                                    not exists simply
                                                    create it with
                                                    mkdir.

                                                    In my case the
                                                    'entry-changes'
                                                    directory is not
                                                    present on all the

                                                    bricks and on all
                                                    the servers:

/data/glusterfs/brick1a/hosted-engine/.glusterfs/indices/:

                                                    total 0

                                                    drw------- 2 root
                                                    root 55 Jun 28 15:02
                                                    dirty

                                                    drw------- 2 root
                                                    root 57 Jun 28 15:02
                                                    xattrop

/data/glusterfs/brick1b/iso-images-repo/.glusterfs/indices/:

                                                    total 0

                                                    drw------- 2 root
                                                    root 55 May 29 14:04
                                                    dirty

                                                    drw------- 2 root
                                                    root 57 May 29 14:04
                                                    xattrop

/data/glusterfs/brick2/vm-images-repo/.glusterfs/indices/:

                                                    total 0

                                                    drw------- 2 root
                                                    root 112 Jun 28
                                                    15:02 dirty

                                                    drw------- 2 root
                                                    root  66 Jun 28
                                                    15:02 xattrop

/data/glusterfs/brick3/vm-images-repo/.glusterfs/indices/:

                                                    total 0

                                                    drw------- 2 root
                                                    root 64 Jun 28 15:02
                                                    dirty

                                                    drw------- 2 root
                                                    root 66 Jun 28 15:02
                                                    xattrop

/data/glusterfs/brick4/vm-images-repo/.glusterfs/indices/:

                                                    total 0

                                                    drw------- 2 root
                                                    root 112 Jun 28
                                                    15:02 dirty

                                                    drw------- 2 root
                                                    root  66 Jun 28
                                                    15:02 xattrop

                                                    I've recently
                                                    upgraded gluster
                                                    from 3.7.16 to
                                                    3.8.12 with the
                                                    rolling

                                                    upgrade procedure
                                                    and I haven't noted
                                                    this issue prior of
                                                    the update, on

                                                    another system
                                                    upgraded with the
                                                    same procedure I
                                                    haven't encountered

                                                    this problem.

                                                    Currently all VM
                                                    images appear to be
                                                    OK but prior to
                                                    create the

                                                    'entry-changes' I
                                                    would like to ask if
                                                    this is still the
                                                    correct

                                                    procedure to fix
                                                    this issue

                                              Did you restart the bricks
                                              after the upgrade? That
                                              should have created the
                                              entry-changes directory.
                                              Can you kill the brick and
                                              restart it and see if the
                                              dir is created? Double
                                              check from the brick logs
                                              that you're indeed running
                                              3.12:  "Started running
                                              /usr/local/sbin/glusterfsd
                                              version 3.8.12" should
                                              appear when the brick
                                              starts.

                                            Please note that if you
                                              are going the route of
                                              killing and restarting,
                                              you need to do it in the
                                              same way you did rolling
                                              upgrade. You need to wait
                                              for heal to complete
                                              before you kill the other
                                              nodes. But before you do
                                              this, it is better you
                                              look at the logs or
                                              confirm the steps you used
                                              for doing upgrade.

                                              -Ravi

                                                      and if this
                                                    problem could have
                                                    affected the

                                                    heal operations
                                                    occurred meanwhile.

                                                    Thanks.

                                                    Greetings,

                                                         Paolo Margara

_______________________________________________

                                                    Gluster-users
                                                    mailing list

                                                    Gluster-users@xxxxxxxxxxx

                                                    http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________

                                                  Gluster-users mailing
                                                  list

                                                  Gluster-users@xxxxxxxxxxx

                                                  http://lists.gluster.org/mailman/listinfo/gluster-users

                                          -- 

                                            Pranith

                        -- 

                      Pranith

          -- 

            Pranith

-- 
Pranith

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users