Re: [External] Re: Self Heal Confusion

Amar Tumballi Suryanarayan <atumball@xxxxxxxxxx> · Tue, 22 Jan 2019 11:32:01 +0530

Brett,
On Sat, Jan 5, 2019 at 3:54 AM Brett Holcomb <biholcomb@xxxxxxxxxx> wrote:

    I wrote a script to search the output of gluster volume heal
      projects info, picks the brick I gave it and then deletes any of
      the files  listed that actually exist in .glusterfs/dir1/dir2.  I
      did this on the first host which had 85 pending and that cleared
      them up so I'll do it via ssh on the other two servers.
    Hopefully that will clear it up and glusterfs will be happy
      again.

If things are fine now, consider posting those scripts as patch to glusterfs, or post in your own github account, so in future we can refer others to use same scripts when in trouble. Thanks.

-Amar

Thanks everyone for the help.

    On 12/31/18 4:39 AM, Davide Obbi wrote:

      cluster.quorum-type auto

        cluster.quorum-count (null)

        cluster.server-quorum-type off

        cluster.server-quorum-ratio 0

        cluster.quorum-reads                    no

        Where exacty do I remove the gfid entries from - the .glusterfs

          directory? --> yes can't remember exactly where but try to
          do a find in the brick paths with the gfid  it should return
          something

        Where do I put the cluster.heal-timeout option - which file?
        --> gluster volume set volumename option value 

        On Mon, Dec 31, 2018 at 10:34 AM Brett Holcomb
          <biholcomb@xxxxxxxxxx> wrote:

        That
          is probably the case as a lot of files were deleted some time
          ago.

          I'm on version 5.2 but was on 3.12 until about a week ago.

          Here is the quorum info.  I'm running a distributed replicated
          volumes 

          in 2 x 3 = 6

          cluster.quorum-type auto

          cluster.quorum-count (null)

          cluster.server-quorum-type off

          cluster.server-quorum-ratio 0

          cluster.quorum-reads                    no

          Where exacty do I remove the gfid entries from - the
          .glusterfs 

          directory?  Do I just delete all the directories can files
          under this 

          directory?

          Where do I put the cluster.heal-timeout option - which file?

          I think you've hit on the cause of the issue.  Thinking back
          we've had 

          some extended power outages and due to a misconfiguration in
          the swap 

          file device name a couple of the nodes did not come up and I
          didn't 

          catch it for a while so maybe the deletes occured then.

          Thank you.

          On 12/31/18 2:58 AM, Davide Obbi wrote:

          > if the long GFID does not correspond to any file it could
          mean the 

          > file has been deleted by the client mounting the volume.
          I think this 

          > is caused when the delete was issued and the number of
          active bricks 

          > were not reaching quorum majority or a second brick was
          taken down 

          > while another was down or did not finish the selfheal,
          the latter more 

          > likely.

          > It would be interesting to see:

          > - what version of glusterfs you running, it happened to
          me with 3.12

          > - volume quorum rules: "gluster volume get vol all | grep
          quorum"

          >

          > To clean it up if i remember correctly it should be
          possible to delete 

          > the gfid entries from the brick mounts on the glusterfs
          server nodes 

          > reporting the files to heal.

          >

          > As a side note you might want to consider changing the
          selfheal 

          > timeout to more agressive schedule in
          cluster.heal-timeout option

          _______________________________________________

          Gluster-users mailing list

          Gluster-users@xxxxxxxxxxx

          https://lists.gluster.org/mailman/listinfo/gluster-users

      -- 

            Davide Obbi
            System
              Administrator

            Booking.com
              B.V.

              Vijzelstraat 66-80 Amsterdam 1017HL Netherlands
            Direct +31207031558

              Empowering people to
                  experience the world since 1996
              43 languages, 214+
                  offices worldwide, 141,000+ global destinations, 29
                  million reported listings 

                Subsidiary of
                  Booking Holdings Inc. (NASDAQ: BKNG)

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-users

-- 
Amar Tumballi (amarts)

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users