Are there any operations you performed before the Self-heal daemon process went down?. Is there any stack trace in the log file of self-heal daemon. it should be in /var/log/glusterfs/glustershd.log if you installed glusterfs using rpms. It would be great if you can attach the logfile of self-heal daemon to debug the issue further. Pranith. ----- Original Message ----- > From: "Khoi Mai" <KHOIMAI at UP.COM> > To: "Pranith Kumar Karampuri" <pkarampu at redhat.com> > Cc: gluster-users at gluster.org > Sent: Thursday, May 2, 2013 8:00:24 PM > Subject: Re: Volume heal daemon 3.4alpha3 > > [root at omhq1439 ~]# gluster volume info > > Volume Name: khoi > Type: Replicate > Volume ID: 0f23eb76-f285-4082-9747-7bf5088c4f10 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: omhq1439:/gluster/test > Brick2: omdx1448:/gluster/test > Options Reconfigured: > performance.cache-size: 1073741824 > performance.cache-refresh-timeout: 1 > performance.read-ahead: off > performance.write-behind-window-size: 524288 > features.quota: on > features.limit-usage: /:2GB > network.ping-timeout: 5 > > Volume Name: dyn_engineering > Type: Replicate > Volume ID: 3d5862f6-8ff8-4304-b41d-a83d9d2cea2c > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: omhq1439:/gluster/dynamic/engineering > Brick2: omdx1448:/gluster/dynamic/engineering > Options Reconfigured: > features.quota: on > features.limit-usage: /:1GB > performance.cache-size: 1073741824 > performance.cache-refresh-timeout: 1 > performance.read-ahead: off > performance.write-behind-window-size: 524288 > > Volume Name: dyn_shn > Type: Replicate > Volume ID: c8e16226-83e6-4117-a327-86774e4fb8f7 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: omhq1439:/gluster/dynamic/shn > Brick2: omdx1448:/gluster/dynamic/shn > Options Reconfigured: > features.quota: on > features.limit-usage: /:2GB > performance.cache-size: 1073741824 > performance.cache-refresh-timeout: 1 > performance.read-ahead: off > performance.write-behind-window-size: 524288 > > Volume Name: static > Type: Replicate > Volume ID: 78919955-2536-4c72-bd8d-b675c8e6aae6 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: omhq1439:/gluster/static/content > Brick2: omdx1448:/gluster/static/content > Options Reconfigured: > features.quota: on > features.limit-usage: /:400GB > performance.write-behind-window-size: 524288 > performance.read-ahead: off > performance.cache-refresh-timeout: 1 > performance.cache-size: 1073741824 > diagnostics.latency-measurement: on > diagnostics.count-fop-hits: on > > Volume Name: dyn_admin > Type: Replicate > Volume ID: dd2c186b-d0c2-4c3a-8a07-ac4c028098e9 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: omhq1439:/gluster/dynamic/admin > Brick2: omdx1448:/gluster/dynamic/admin > Options Reconfigured: > features.quota: on > features.limit-usage: /:1GB > performance.cache-size: 1073741824 > performance.cache-refresh-timeout: 1 > performance.read-ahead: off > performance.write-behind-window-size: 524288 > > Volume Name: dyn_mechanical > Type: Replicate > Volume ID: 4c0e9857-6694-4a65-943e-d8fdd60ee41c > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: omhq1439:/gluster/dynamic/mechanical > Brick2: omdx1448:/gluster/dynamic/mechanical > Options Reconfigured: > features.quota: on > features.limit-usage: /:3GB > performance.cache-size: 1073741824 > performance.cache-refresh-timeout: 1 > performance.read-ahead: off > performance.write-behind-window-size: 524288 > > Volume Name: dyn_wf_content > Type: Replicate > Volume ID: 48702293-8806-4706-a42d-fa343939d555 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: omhq1439:/gluster/dynamic/wf_content > Brick2: omdx1448:/gluster/dynamic/wf_content > Options Reconfigured: > features.quota: on > features.limit-usage: /:4GB > performance.cache-size: 1073741824 > performance.cache-refresh-timeout: 1 > performance.read-ahead: off > performance.write-behind-window-size: 524288 > > Volume Name: dyn_weblogic > Type: Replicate > Volume ID: 23775a71-38d9-496f-b340-e5167271dd31 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: omhq1439:/gluster/dynamic/weblogic > Brick2: omdx1448:/gluster/dynamic/weblogic > Options Reconfigured: > features.quota: on > features.limit-usage: /:7GB > performance.cache-size: 1073741824 > performance.cache-refresh-timeout: 1 > performance.read-ahead: off > performance.write-behind-window-size: 524288 > > Volume Name: dyn_ert > Type: Replicate > Volume ID: 28b0faed-6f40-4df2-8683-857412fa2afd > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: omhq1439:/gluster/dynamic/ert > Brick2: omdx1448:/gluster/dynamic/ert > Options Reconfigured: > features.quota: on > features.limit-usage: /:5GB > performance.cache-size: 1073741824 > performance.cache-refresh-timeout: 1 > performance.read-ahead: off > performance.write-behind-window-size: 524288 > > Volume Name: dyn_wf_content_rd > Type: Replicate > Volume ID: 33a9e8ca-0689-4ce8-b20b-4172062e70b3 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: omhq1439:/gluster/dynamic/RD_content > Brick2: omdx1448:/gluster/dynamic/RD_content > Options Reconfigured: > features.quota: on > features.limit-usage: /:2GB > performance.cache-size: 1073741824 > performance.cache-refresh-timeout: 1 > performance.read-ahead: off > performance.write-behind-window-size: 524288 > > Volume Name: dyn_tibco > Type: Replicate > Volume ID: cc22b891-b21b-48e8-a88b-7f46db7b145f > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: omhq1439:/gluster/dynamic/tibco > Brick2: omdx1448:/gluster/dynamic/tibco > Options Reconfigured: > features.quota: on > features.limit-usage: /:1GB > performance.cache-size: 1073741824 > performance.cache-refresh-timeout: 1 > performance.read-ahead: off > performance.write-behind-window-size: 524288 > > Volume Name: dyn_coldfusion > Type: Replicate > Volume ID: 4f6d1aa0-4e72-439d-86be-5b98d3960974 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: omhq1439:/gluster/dynamic/coldfusion > Brick2: omdx1448:/gluster/dynamic/coldfusion > Options Reconfigured: > performance.read-ahead: off > performance.cache-size: 1073741824 > performance.cache-refresh-timeout: 1 > performance.write-behind-window-size: 524288 > features.limit-usage: /:5GB > features.quota: on > diagnostics.latency-measurement: on > diagnostics.count-fop-hits: on > [root at omhq1439 ~]# > > > Khoi Mai > Union Pacific Railroad > Distributed Engineering & Architecture > Project Engineer > > > > > > From: Pranith Kumar Karampuri <pkarampu at redhat.com> > To: Khoi Mai <KHOIMAI at UP.COM> > Cc: gluster-users at gluster.org > Date: 05/01/2013 11:34 PM > Subject: Re: Volume heal daemon 3.4alpha3 > > > > > Could you attach gluster volume status output, gluster volume info. > > Pranith. > ----- Original Message ----- > > From: "Khoi Mai" <KHOIMAI at UP.COM> > > To: gluster-users at gluster.org > > Sent: Wednesday, May 1, 2013 2:41:07 AM > > Subject: Volume heal daemon 3.4alpha3 > > > > gluster> volume heal dyn_coldfusion > > Self-heal daemon is not running. Check self-heal daemon log file. > > gluster> > > > > Is there a specific log? When i check /var/log/glusterfs/glustershd.log > > glustershd.log:[2013-04-30 15:51:40.463259] E > > [afr-self-heald.c:409:_crawl_proceed] 0-dyn_coldfusion-replicate-0: > > Stopping > > crawl for dyn_coldfusion-client-1 , subvol went down > > > > > > Is there a specific log? When i check /var/log/glusterfs/glustershd.log > > glustershd.log:[2013-04-30 15:51:40.463259] E > > [afr-self-heald.c:409:_crawl_proceed] 0-dyn_coldfusion-replicate-0: > > Stopping > > crawl for dyn_coldfusion-client-1 , subvol went down > > > > > > I'm not sure what that means. Can someone please explain? I've tried to > > execute the heal cmd, and get this output: > > > > gluster> volume heal dyn_coldfusion full > > Self-heal daemon is not running. Check self-heal daemon log file. > > gluster> volume heal dyn_coldfusion info > > Gathering Heal info on volume dyn_coldfusion has been successful > > > > Brick omhq1439:/gluster/dynamic/coldfusion > > Status: self-heal-daemon is not running on > > c2262e8b-0940-48f3-9e5c-72ee18d4c653 > > > > Brick omdx1448:/gluster/dynamic/coldfusion > > Status: self-heal-daemon is not running on > > 127048c2-92ed-480e-bca8-d6449ee54b1e > > gluster> > > > > > > Khoi Mai > > Union Pacific Railroad > > Distributed Engineering & Architecture > > Project Engineer > > > > > > ** > > > > This email and any attachments may contain information that is confidential > > and/or privileged for the sole use of the intended recipient. Any use, > > review, disclosure, copying, distribution or reliance by others, and any > > forwarding of this email or its contents, without the express permission of > > the sender is strictly prohibited by law. If you are not the intended > > recipient, please contact the sender immediately, delete the e-mail and > > destroy all copies. > > ** > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > ** > > This email and any attachments may contain information that is confidential > and/or privileged for the sole use of the intended recipient. Any use, > review, disclosure, copying, distribution or reliance by others, and any > forwarding of this email or its contents, without the express permission of > the sender is strictly prohibited by law. If you are not the intended > recipient, please contact the sender immediately, delete the e-mail and > destroy all copies. > ** > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users