Re: Entries in heal pending

Szilágyi Balázs <szilagyi.balazs@xxxxxxxxxx> · Thu, 19 Dec 2019 10:21:27 +0100

    Hi Strahil,

    Thanks for the input. It worked flawlesly!

    I copy/paste the process here (maybe useful for someone).

    ### entering the "corrupt" gluster brick folder

    [root@node1 data]# cd
7ac28c32-947b-4ad5-8d69-213a205f06e8/images/d7c11f2e-58e4-4fe1-8236-6ded0f4dd757/

    [root@node1 d7c11f2e-58e4-4fe1-8236-6ded0f4dd757]# ls -la

    total 1080

    drwxr-xr-x. 2 vdsm kvm    8192 Jan  1  1970 .

    drwxr-xr-x. 4 vdsm kvm      94 Dec 18 11:11 ..

    -rw-rw----. 2 vdsm kvm   30720 Dec 17 13:17
      e566f230-df72-4073-aecf-7e5a8d6b569b

    -rw-rw----. 2 vdsm kvm 1048576 Dec  2 14:55
      e566f230-df72-4073-aecf-7e5a8d6b569b.lease

    -rw-r--r--. 2 vdsm kvm     429 Dec 17 13:17
      e566f230-df72-4073-aecf-7e5a8d6b569b.meta

    ### making a backup of the files (did it on all the nodes)

    [root@node1 d7c11f2e-58e4-4fe1-8236-6ded0f4dd757]# mkdir -p
/root/save/gluster_bricks/data/data/7ac28c32-947b-4ad5-8d69-213a205f06e8/images/d7c11f2e-58e4-4fe1-8236-6ded0f4dd757

    [root@node1 d7c11f2e-58e4-4fe1-8236-6ded0f4dd757]# cp *
/root/save/gluster_bricks/data/data/7ac28c32-947b-4ad5-8d69-213a205f06e8/images/d7c11f2e-58e4-4fe1-8236-6ded0f4dd757/

    ### rsyncing the files from the selected source (i choosen
    node2 as soruce, and did this on node3 also)

    [root@node1 d7c11f2e-58e4-4fe1-8236-6ded0f4dd757]# rsync
      -avh
root@node2.ovirt.local:/gluster_bricks/data/data/7ac28c32-947b-4ad5-8d69-213a205f06e8/images/d7c11f2e-58e4-4fe1-8236-6ded0f4dd757/e566f230-df72-4073-aecf-7e5a8d6b569b*
      .

    receiving incremental file list

    sent 20 bytes  received 129 bytes  99.33 bytes/sec

    total size is 1.08M  speedup is 7,246.48

    ### started the healing

    [root@node1 d7c11f2e-58e4-4fe1-8236-6ded0f4dd757]# gluster
      volume heal data

    Launching heal operation to perform index self heal on
      volume data has been successful

    Use heal info commands to check status.

    ### checking result

      [root@node1 d7c11f2e-58e4-4fe1-8236-6ded0f4dd757]#
      gluster volume heal data info

    Brick node1storage.ovirt.local:/gluster_bricks/data/data

    /7ac28c32-947b-4ad5-8d69-213a205f06e8/images/079904a4-71af-492c-bb2f-b45a918e8a2e/fce4b64d-2444-4f11-b226-db75bb2960c2.meta

    Status: Connected

    Number of entries: 1

    Brick node2storage.ovirt.local:/gluster_bricks/data/data

    /7ac28c32-947b-4ad5-8d69-213a205f06e8/images/079904a4-71af-492c-bb2f-b45a918e8a2e/fce4b64d-2444-4f11-b226-db75bb2960c2.meta

    Status: Connected

    Number of entries: 1

    Brick node3storage.ovirt.local:/gluster_bricks/data/data

    Status: Connected

    Number of entries: 0

    ### As you can see the only one pending entry left. I did the
    above fix for the other files, and it also healed successfully.

    Best Regards,

      Balazs Szilagyi

    2019.12.18. 20:07 keltezéssel, Strahil Nikolov írta:

        You are the second person
          (excluding me) who observes this behaviour.
        The easiest way to resolve
          this is to:
        1. Check which file is newest
          (there is a timestamp in the file) for :

            /gluster_bricks/data/data/7ac28c32-947b-4ad5-8d69-213a205f06e8/images/d7c11f2e-58e4-4fe1-8236-6ded0f4dd757/e566f230-df72-4073-aecf-7e5a8d6b569b.meta

          And for :
          /gluster_bricks/data/data/7ac28c32-947b-4ad5-8d69-213a205f06e8/images/079904a4-71af-492c-bb2f-b45a918e8a2e/fce4b64d-2444-4f11-b226-db75bb2960c2.meta

        Let's assume
          node3storage.ovirt.local has the newest data.

        2.Then you can just backup
          (just in case you change your mind) the files locally and
          rsync from node3storage.ovirt.local (actually replace with the
          node with newest timestamp in the file) to the other bricks.

        3. Run a gluster heal just to
          notify gluster that the issue is resolved.

        In my case one of the nodes
          had a newer version of the file (I am using replica 2 arbiter
          1 volume) and the gfid was different and this prevented
          Gluster from healing that.

        Usually , oVirt just updates
          the timestamp in the meta files - so even an older version is
          not a problem.

        P.S.: What version of gluster
          are you using. I suppose v6.5 or v6.6 , right ?

        Best Regards,
        Strahil Nikolov

           В сряда, 18 декември 2019 г., 16:18:12 ч. Гринуич+2,
            Szilágyi Balázs <szilagyi.balazs@xxxxxxxxxx> написа: 

            Dear Gluster Users,

            I'm a newbie to gluster storage and during
              the stability testing I made 

            a node reboot and I've got some heal issues
              after that, that i'm unable 

            to fix.

            Anyway the vm's are running fine from the
              storage, and I did not 

            discovered any data corruption.

            The system is oVirt version 4.3.7. I have 3
              nodes with a replica 3. 

            Please let me know what to do with the
              pending heals that are unable to 

            finish.

            Also let me know if some more details are
              necessary.

            Thanks,

               Balazs

            [root@node2
              ~]# gluster volume status data

            Status of volume: data

            Gluster process                            
              TCP Port  RDMA Port Online  Pid

            ------------------------------------------------------------------------------

            Brick node1storage.ovirt.local:/gluster_bri

            cks/data/data                              
              49152     0 Y       4187

            Brick node2storage.ovirt.local:/gluster_bri

            cks/data/data                              
              49154     0 Y       6163

            Brick node3storage.ovirt.local:/gluster_bri

            cks/data/data                              
              49154     0 Y       19439

            Self-heal Daemon on localhost              
              N/A       N/A Y       3136

            Self-heal Daemon on node3storage.ovirt.loca

            l                                          
              N/A       N/A Y       5876

            Self-heal Daemon on node1storage.ovirt.loca

            l                                          
              N/A       N/A Y       15479

            Task Status of Volume data

            ------------------------------------------------------------------------------

            There are no active volume tasks

            [root@node2
              ~]#

            [root@node2
              ~]# gluster volume heal data info summary

            Brick
              node1storage.ovirt.local:/gluster_bricks/data/data

            Status: Connected

            Total Number of entries: 2

            Number of entries in heal pending: 2

            Number of entries in split-brain: 0

            Number of entries possibly healing: 0

            Brick
              node2storage.ovirt.local:/gluster_bricks/data/data

            Status: Connected

            Total Number of entries: 2

            Number of entries in heal pending: 2

            Number of entries in split-brain: 0

            Number of entries possibly healing: 0

            Brick
              node3storage.ovirt.local:/gluster_bricks/data/data

            Status: Connected

            Total Number of entries: 0

            Number of entries in heal pending: 0

            Number of entries in split-brain: 0

            Number of entries possibly healing: 0

            [root@node2
              ~]# gluster volume heal data info

            Brick
              node1storage.ovirt.local:/gluster_bricks/data/data

            /7ac28c32-947b-4ad5-8d69-213a205f06e8/images/d7c11f2e-58e4-4fe1-8236-6ded0f4dd757/e566f230-df72-4073-aecf-7e5a8d6b569b.meta

            /7ac28c32-947b-4ad5-8d69-213a205f06e8/images/079904a4-71af-492c-bb2f-b45a918e8a2e/fce4b64d-2444-4f11-b226-db75bb2960c2.meta

            Status: Connected

            Number of entries: 2

            Brick
              node2storage.ovirt.local:/gluster_bricks/data/data

            /7ac28c32-947b-4ad5-8d69-213a205f06e8/images/d7c11f2e-58e4-4fe1-8236-6ded0f4dd757/e566f230-df72-4073-aecf-7e5a8d6b569b.meta

            /7ac28c32-947b-4ad5-8d69-213a205f06e8/images/079904a4-71af-492c-bb2f-b45a918e8a2e/fce4b64d-2444-4f11-b226-db75bb2960c2.meta

            Status: Connected

            Number of entries: 2

            Brick
              node3storage.ovirt.local:/gluster_bricks/data/data

            Status: Connected

            Number of entries: 0

            ________

            Community Meeting Calendar:

            APAC Schedule -

            Every 2nd and 4th Tuesday at 11:30 AM IST

            Bridge: https://bluejeans.com/441850968

            NA/EMEA Schedule -

            Every 1st and 3rd Tuesday at 01:00 PM EDT

            Bridge: https://bluejeans.com/441850968

            Gluster-users mailing list

            Gluster-users@xxxxxxxxxxx

            https://lists.gluster.org/mailman/listinfo/gluster-users

________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users