Re: [External] Re: Self Heal Confusion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Healing time set to 120 seconds for now.

Just to make sure I understand I need to take the result of the gluster volume heal projects info and put it in a file. Then try and find each guid listed in that file in the .glusterfs directory for each brick listed in the output as having unhealed files and delete that file - if it exists.  If it doesn't exist don't worry about it.

So these bricks have unhealed entries listed

/srv/gfs01/Projects/.glusterfs - 85 files

/srv/gfs05/Projects/.glusterfs  - 58854 files

/srv/gfs06/Projects/.glusterfs- 58854 files

Script time!

On 12/31/18 4:39 AM, Davide Obbi wrote:
cluster.quorum-type auto
cluster.quorum-count (null)
cluster.server-quorum-type off
cluster.server-quorum-ratio 0
cluster.quorum-reads                    no

Where exacty do I remove the gfid entries from - the .glusterfs
directory? --> yes can't remember exactly where but try to do a find in the brick paths with the gfid  it should return something

Where do I put the cluster.heal-timeout option - which file? --> gluster volume set volumename option value

On Mon, Dec 31, 2018 at 10:34 AM Brett Holcomb <biholcomb@xxxxxxxxxx> wrote:
That is probably the case as a lot of files were deleted some time ago.

I'm on version 5.2 but was on 3.12 until about a week ago.

Here is the quorum info.  I'm running a distributed replicated volumes
in 2 x 3 = 6

cluster.quorum-type auto
cluster.quorum-count (null)
cluster.server-quorum-type off
cluster.server-quorum-ratio 0
cluster.quorum-reads                    no

Where exacty do I remove the gfid entries from - the .glusterfs
directory?  Do I just delete all the directories can files under this
directory?

Where do I put the cluster.heal-timeout option - which file?

I think you've hit on the cause of the issue.  Thinking back we've had
some extended power outages and due to a misconfiguration in the swap
file device name a couple of the nodes did not come up and I didn't
catch it for a while so maybe the deletes occured then.

Thank you.

On 12/31/18 2:58 AM, Davide Obbi wrote:
> if the long GFID does not correspond to any file it could mean the
> file has been deleted by the client mounting the volume. I think this
> is caused when the delete was issued and the number of active bricks
> were not reaching quorum majority or a second brick was taken down
> while another was down or did not finish the selfheal, the latter more
> likely.
> It would be interesting to see:
> - what version of glusterfs you running, it happened to me with 3.12
> - volume quorum rules: "gluster volume get vol all | grep quorum"
>
> To clean it up if i remember correctly it should be possible to delete
> the gfid entries from the brick mounts on the glusterfs server nodes
> reporting the files to heal.
>
> As a side note you might want to consider changing the selfheal
> timeout to more agressive schedule in cluster.heal-timeout option
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users


--
Davide Obbi
System Administrator

Booking.com B.V.
Vijzelstraat 66-80 Amsterdam 1017HL Netherlands
Direct +31207031558
Booking.com
Empowering people to experience the world since 1996
43 languages, 214+ offices worldwide, 141,000+ global destinations, 29 million reported listings 
Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux