I wrote a script to search the output of gluster volume heal
projects info, picks the brick I gave it and then deletes any of
the files listed that actually exist in .glusterfs/dir1/dir2. I
did this on the first host which had 85 pending and that cleared
them up so I'll do it via ssh on the other two servers.
Hopefully that will clear it up and glusterfs will be happy
again.
Thanks everyone for the help.
On 12/31/18 4:39 AM, Davide Obbi wrote:
cluster.quorum-type auto
cluster.quorum-count (null)
cluster.server-quorum-type off
cluster.server-quorum-ratio 0
cluster.quorum-reads no
Where exacty do I remove the gfid entries from - the .glusterfs
directory? --> yes can't remember exactly where but try to
do a find in the brick paths with the gfid it should return
something
Where do I put the cluster.heal-timeout option - which file?
--> gluster volume set volumename option value
That
is probably the case as a lot of files were deleted some time
ago.
I'm on version 5.2 but was on 3.12 until about a week ago.
Here is the quorum info. I'm running a distributed replicated
volumes
in 2 x 3 = 6
cluster.quorum-type auto
cluster.quorum-count (null)
cluster.server-quorum-type off
cluster.server-quorum-ratio 0
cluster.quorum-reads no
Where exacty do I remove the gfid entries from - the
.glusterfs
directory? Do I just delete all the directories can files
under this
directory?
Where do I put the cluster.heal-timeout option - which file?
I think you've hit on the cause of the issue. Thinking back
we've had
some extended power outages and due to a misconfiguration in
the swap
file device name a couple of the nodes did not come up and I
didn't
catch it for a while so maybe the deletes occured then.
Thank you.
On 12/31/18 2:58 AM, Davide Obbi wrote:
> if the long GFID does not correspond to any file it could
mean the
> file has been deleted by the client mounting the volume.
I think this
> is caused when the delete was issued and the number of
active bricks
> were not reaching quorum majority or a second brick was
taken down
> while another was down or did not finish the selfheal,
the latter more
> likely.
> It would be interesting to see:
> - what version of glusterfs you running, it happened to
me with 3.12
> - volume quorum rules: "gluster volume get vol all | grep
quorum"
>
> To clean it up if i remember correctly it should be
possible to delete
> the gfid entries from the brick mounts on the glusterfs
server nodes
> reporting the files to heal.
>
> As a side note you might want to consider changing the
selfheal
> timeout to more agressive schedule in
cluster.heal-timeout option
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
--
Davide Obbi
System
Administrator
Booking.com
B.V.
Vijzelstraat 66-80 Amsterdam 1017HL Netherlands
Direct +31207031558
Empowering people to
experience the world since 1996
43 languages, 214+
offices worldwide, 141,000+ global destinations, 29
million reported listings
Subsidiary of
Booking Holdings Inc. (NASDAQ: BKNG)
|
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users