Re: Self Heal Confusion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Resend as I did not reply to the list earlier.  TBird responded to the poster and not the list.

On 12/27/18 11:46 AM, Brett Holcomb wrote:

Thank you. I appreciate the help  Here is the information.  Let me know if you need anything else.  I'm fairly new to gluster.

Gluster version is 5.2

1. gluster v info

Volume Name: projects
Type: Distributed-Replicate
Volume ID: 5aac71aa-feaa-44e9-a4f9-cb4dd6e0fdc3
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: gfssrv1:/srv/gfs01/Projects
Brick2: gfssrv2:/srv/gfs01/Projects
Brick3: gfssrv3:/srv/gfs01/Projects
Brick4: gfssrv4:/srv/gfs01/Projects
Brick5: gfssrv5:/srv/gfs01/Projects
Brick6: gfssrv6:/srv/gfs01/Projects
Options Reconfigured:
cluster.self-heal-daemon: enable
performance.quick-read: off
performance.parallel-readdir: off
performance.readdir-ahead: off
performance.write-behind: off
performance.read-ahead: off
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
server.allow-insecure: on
storage.build-pgfid: on
changelog.changelog: on
changelog.capture-del-path: on

2.  gluster v status

Status of volume: projects
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gfssrv1:/srv/gfs01/Projects           49154     0          Y       7213
Brick gfssrv2:/srv/gfs01/Projects           49154     0          Y       6932
Brick gfssrv3:/srv/gfs01/Projects           49154     0          Y       6920
Brick gfssrv4:/srv/gfs01/Projects           49154     0          Y       6732
Brick gfssrv5:/srv/gfs01/Projects           49154     0          Y       6950
Brick gfssrv6:/srv/gfs01/Projects           49154     0          Y       6879
Self-heal Daemon on localhost               N/A       N/A        Y       11484
Self-heal Daemon on gfssrv2                 N/A       N/A        Y       10366
Self-heal Daemon on gfssrv4                 N/A       N/A        Y       9872
Self-heal Daemon on srv-1-gfs3.corp.l1049h.
net                                         N/A       N/A        Y       9892
Self-heal Daemon on gfssrv6                 N/A       N/A        Y       10372
Self-heal Daemon on gfssrv5                 N/A       N/A        Y       10761
 
Task Status of Volume projects
------------------------------------------------------------------------------
There are no active volume tasks

3. I've given the summary since the actual list for two volumes is around 5900 entries.

Brick gfssrv1:/srv/gfs01/Projects
Status: Connected
Total Number of entries: 85
Number of entries in heal pending: 85
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick gfssrv2:/srv/gfs01/Projects
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick gfssrv3:/srv/gfs01/Projects
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick gfssrv4:/srv/gfs01/Projects
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick gfssrv5:/srv/gfs01/Projects
Status: Connected
Total Number of entries: 58854
Number of entries in heal pending: 58854
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick gfssrv6:/srv/gfs01/Projects
Status: Connected
Total Number of entries: 58854
Number of entries in heal pending: 58854
Number of entries in split-brain: 0
Number of entries possibly healing: 0

On 12/27/18 3:09 AM, Ashish Pandey wrote:
Hi Brett,

Could you please tell us more about the setup?

1 - Gluster v info
2 - gluster v status
3 - gluster v heal <volname> info

These are the very basic information to start with debugging or suggesting any workaround.
It should always be included when asking such questions on mailing list so that people can reply sooner.


Note: Please hide IP address/hostname or any other information you don't want world to see.

---
Ashish


From: "Brett Holcomb" <biholcomb@xxxxxxxxxx>
To: gluster-users@xxxxxxxxxxx
Sent: Thursday, December 27, 2018 12:19:15 AM
Subject: Re: Self Heal Confusion

Still no change in the heals pending.  I found this reference, https://archive.fosdem.org/2017/schedule/event/glusterselinux/attachments/slides/1876/export/events/attachments/glusterselinux/slides/1876/fosdem.pdf, which mentions the default SELinux context for a brick and that internal operations such as self-heal, rebalance should be ignored. but they do not elaborate on what ignore means - is it just not doing self-heal or something else.

I did set SELinux to permissive and nothing changed.  I'll try setting the bricks to the context mentioned in this pdf and see what happens.


On 12/20/18 8:26 PM, John Strunk wrote:
Assuming your bricks are up... yes, the heal count should be decreasing.

There is/was a bug wherein self-heal would stop healing but would still be running. I don't know whether your version is affected, but the remedy is to just restart the self-heal daemon.
Force start one of the volumes that has heals pending. The bricks are already running, but it will cause shd to restart and, assuming this is the problem, healing should begin...

$ gluster vol start my-pending-heal-vol force

Others could better comment on the status of the bug.

-John


On Thu, Dec 20, 2018 at 5:45 PM Brett Holcomb <biholcomb@xxxxxxxxxx> wrote:
I have one volume that has 85 pending entries in healing and two more
volumes with 58,854 entries in healing pending.  These numbers are from
the volume heal info summary command.  They have stayed constant for two
days now.  I've read the gluster docs and many more.  The Gluster docs
just give some commands and non gluster docs basically repeat that. 
Given that it appears no self-healing is going on for my volume I am
confused as to why.

1.  If a self-heal deamon is listed on a host (all of mine show one with
a volume status command) can I assume it's enabled and running?

2.  I assume the volume that has all the self-heals pending has some
serious issues even though I can access the files and directories on
it.  If self-heal is running shouldn't the numbers be decreasing?

It appears to me self-heal is not working properly so how to I get it to
start working or should I delete the volume and start over?

I'm running gluster 5.2 on Centos 7 latest and updated.

Thank you.


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux