Hi Brett,
First the answers of all your questions -
1. If a self-heal deamon is listed on a host (all of
mine show one with
a volume status command) can I assume it's enabled and
running?
For your volume, projects self heal daemon is UP and
running
2. I assume the volume that has all the self-heals pending
has some
serious issues even though I can access the files and
directories on
it. If self-heal is running shouldn't the numbers be
decreasing?
It should heal the entries and the number of entries
coming in "gluster v heal volname info" command should be
decreasing.
It appears to me self-heal is not working properly so how to
I get it to
start working or should I delete the volume and start over?
As you can access all the files from mount point, I think
the volume and the files are in good state as of now.
I don't think you should think of deleting your volume
before trying to fix it.
If there is no fix or the fix is taking time you can go
ahead with that option.
-----------------------
Why all these options are off?
performance.quick-read: off
performance.parallel-readdir: off
performance.readdir-ahead: off
performance.write-behind: off
performance.read-ahead: off
Although this should not matter to your issue but I think
you should enable all the above unless you have a reason to
not to do so.
--------------------
I would like you to perform following steps and provide
some more information -
1 - Try to restart self heal and see if that works.
"gluster v start volume force" will kill and restart the
self heal processes.
2 - If step 1 is not fruitful, get the list of entries
need to be healed and pick one of the entry to heal. I mean
we should focus on one entry to find out why it is
not getting healed instead of all the 5900 entries. Let's
call it entry1.
3 - Now access the entry1 from mount point, read, write
on it and see if this entry has been healed. Check for heal
info. Accessing file from mount point triggers client side
heal
which could also heal the file.
4 - Check for the logs in /var/log/gluster, mount logs
and glustershd logs should be checked and provided.
5 - Get the external attributes of entry1 from all the
bricks.
If the path of the entry1 on mount point is /a/b/c/entry1
then you have to run following command on all the nodes -
getfattr -m. -d -e hex <path of the brick on the
node>/a/b/c/entry1
Please provide the output of above command too.
---
Ashish
From: "Brett Holcomb"
<biholcomb@xxxxxxxxxx>
To: gluster-users@xxxxxxxxxxx
Sent: Friday, December 28, 2018 3:49:50 AM
Subject: Re: Self Heal Confusion
Resend as I did not reply to the list earlier. TBird
responded to the poster and not the list.
On 12/27/18 11:46 AM, Brett
Holcomb wrote:
Thank you. I appreciate the help Here is the
information. Let me know if you need anything else.
I'm fairly new to gluster.
Gluster version is 5.2
1. gluster v info
Volume Name: projects
Type: Distributed-Replicate
Volume ID: 5aac71aa-feaa-44e9-a4f9-cb4dd6e0fdc3
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: gfssrv1:/srv/gfs01/Projects
Brick2: gfssrv2:/srv/gfs01/Projects
Brick3: gfssrv3:/srv/gfs01/Projects
Brick4: gfssrv4:/srv/gfs01/Projects
Brick5: gfssrv5:/srv/gfs01/Projects
Brick6: gfssrv6:/srv/gfs01/Projects
Options Reconfigured:
cluster.self-heal-daemon: enable
performance.quick-read: off
performance.parallel-readdir: off
performance.readdir-ahead: off
performance.write-behind: off
performance.read-ahead: off
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
server.allow-insecure: on
storage.build-pgfid: on
changelog.changelog: on
changelog.capture-del-path: on
2. gluster v status
Status of volume: projects
Gluster process TCP Port
RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gfssrv1:/srv/gfs01/Projects 49154
0 Y 7213
Brick gfssrv2:/srv/gfs01/Projects 49154
0 Y 6932
Brick gfssrv3:/srv/gfs01/Projects 49154
0 Y 6920
Brick gfssrv4:/srv/gfs01/Projects 49154
0 Y 6732
Brick gfssrv5:/srv/gfs01/Projects 49154
0 Y 6950
Brick gfssrv6:/srv/gfs01/Projects 49154
0 Y 6879
Self-heal Daemon on localhost N/A
N/A Y 11484
Self-heal Daemon on gfssrv2 N/A
N/A Y 10366
Self-heal Daemon on gfssrv4 N/A
N/A Y 9872
Self-heal Daemon on srv-1-gfs3.corp.l1049h.
net N/A
N/A Y 9892
Self-heal Daemon on gfssrv6 N/A
N/A Y 10372
Self-heal Daemon on gfssrv5 N/A
N/A Y 10761
Task Status of Volume projects
------------------------------------------------------------------------------
There are no active volume tasks
3. I've given the summary since the actual list for two
volumes is around 5900 entries.
Brick gfssrv1:/srv/gfs01/Projects
Status: Connected
Total Number of entries: 85
Number of entries in heal pending: 85
Number of entries in split-brain: 0
Number of entries possibly healing: 0
Brick gfssrv2:/srv/gfs01/Projects
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0
Brick gfssrv3:/srv/gfs01/Projects
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0
Brick gfssrv4:/srv/gfs01/Projects
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0
Brick gfssrv5:/srv/gfs01/Projects
Status: Connected
Total Number of entries: 58854
Number of entries in heal pending: 58854
Number of entries in split-brain: 0
Number of entries possibly healing: 0
Brick gfssrv6:/srv/gfs01/Projects
Status: Connected
Total Number of entries: 58854
Number of entries in heal pending: 58854
Number of entries in split-brain: 0
Number of entries possibly healing: 0
On 12/27/18 3:09 AM, Ashish
Pandey wrote:
Hi Brett,
Could you please tell us more about the setup?
1 - Gluster v info
2 - gluster v status
3 - gluster v heal <volname> info
These are the very basic information to start
with debugging or suggesting any workaround.
It should always be included when asking such
questions on mailing list so that people can reply
sooner.
Note: Please hide IP address/hostname or any
other information you don't want world to see.
---
Ashish
From: "Brett Holcomb"
<biholcomb@xxxxxxxxxx>
To: gluster-users@xxxxxxxxxxx
Sent: Thursday, December 27, 2018 12:19:15
AM
Subject: Re: Self Heal
Confusion
Still no change in the heals pending. I found
this reference, https://archive.fosdem.org/2017/schedule/event/glusterselinux/attachments/slides/1876/export/events/attachments/glusterselinux/slides/1876/fosdem.pdf,
which mentions the default SELinux context for a
brick and that internal operations such as
self-heal, rebalance should be ignored. but they
do not elaborate on what ignore means - is it just
not doing self-heal or something else.
I did set SELinux to permissive and nothing
changed. I'll try setting the bricks to the
context mentioned in this pdf and see what
happens.
On 12/20/18 8:26 PM,
John Strunk wrote:
Assuming your bricks are up... yes,
the heal count should be decreasing.
There is/was a bug wherein self-heal would
stop healing but would still be running. I
don't know whether your version is affected,
but the remedy is to just restart the
self-heal daemon.
Force start one of the volumes that has
heals pending. The bricks are already running,
but it will cause shd to restart and, assuming
this is the problem, healing should begin...
$ gluster vol start my-pending-heal-vol
force
Others could better comment on the status
of the bug.
-John
I have one
volume that has 85 pending entries in healing
and two more
volumes with 58,854 entries in healing
pending. These numbers are from
the volume heal info summary command. They
have stayed constant for two
days now. I've read the gluster docs and many
more. The Gluster docs
just give some commands and non gluster docs
basically repeat that.
Given that it appears no self-healing is going
on for my volume I am
confused as to why.
1. If a self-heal deamon is listed on a host
(all of mine show one with
a volume status command) can I assume it's
enabled and running?
2. I assume the volume that has all the
self-heals pending has some
serious issues even though I can access the
files and directories on
it. If self-heal is running shouldn't the
numbers be decreasing?
It appears to me self-heal is not working
properly so how to I get it to
start working or should I delete the volume
and start over?
I'm running gluster 5.2 on Centos 7 latest and
updated.
Thank you.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users