Could you please have look my issue if you have time (atleast provide workaround).
BR
Salam
From: Shaik Salam/HYD/TCS
To: "Sanju Rakonde" <srakonde@xxxxxxxxxx>
Cc: "Amar Tumballi Suryanarayan" <atumball@xxxxxxxxxx>, "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>, "Murali Kottakota" <murali.kottakota@xxxxxxx>
Date: 01/23/2019 05:50 PM
Subject: Re: [Bugs] Bricks are going offline unable to recover with heal/start force commands
Hi Sanju,
Please find requested information.
Sorry to repeat again I am trying start force command once brick log enabled to debug by taking one volume example.
Please correct me If I am doing wrong.
[root@master ~]# oc rsh glusterfs-storage-vll7x
sh-4.2# gluster volume info vol_3442e86b6d994a14de73f1b8c82cf0b8
Volume Name: vol_3442e86b6d994a14de73f1b8c82cf0b8
Type: Replicate
Volume ID: 15477f36-22e8-4757-a0ce-9000b63fa849
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.3.6:/var/lib/heketi/mounts/vg_ca57f326195c243be2380ce4e42a4191/brick_952d75fd193c7209c9a81acbc23a3747/brick
Brick2: 192.168.3.5:/var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
Brick3: 192.168.3.15:/var/lib/heketi/mounts/vg_462ea199185376b03e4b0317363bb88c/brick_1736459d19e8aaa1dcb5a87f48747d04/brick
Options Reconfigured:
diagnostics.brick-log-level: INFO
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8
Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.3.6:/var/lib/heketi/mounts/vg
_ca57f326195c243be2380ce4e42a4191/brick_952
d75fd193c7209c9a81acbc23a3747/brick 49157 0 Y 250
Brick 192.168.3.5:/var/lib/heketi/mounts/vg
_d5f17487744584e3652d3ca943b0b91b/brick_e15
c12cceae12c8ab7782dd57cf5b6c1/brick N/A N/A N N/A
Brick 192.168.3.15:/var/lib/heketi/mounts/v
g_462ea199185376b03e4b0317363bb88c/brick_17
36459d19e8aaa1dcb5a87f48747d04/brick 49173 0 Y 225
Self-heal Daemon on localhost N/A N/A Y 108434
Self-heal Daemon on matrix1.matrix.orange.l
ab N/A N/A Y 69525
Self-heal Daemon on matrix2.matrix.orange.l
ab N/A N/A Y 18569
gluster volume set vol_3442e86b6d994a14de73f1b8c82cf0b8 diagnostics.brick-log-level DEBUG
volume set: success
sh-4.2# gluster volume get vol_3442e86b6d994a14de73f1b8c82cf0b8 all |grep log
cluster.entry-change-log on
cluster.data-change-log on
cluster.metadata-change-log on
diagnostics.brick-log-level DEBUG
sh-4.2# cd /var/log/glusterfs/bricks/
sh-4.2# ls -la |grep brick_e15c12cceae12c8ab7782dd57cf5b6c1
-rw-------. 1 root root 0 Jan 20 02:46 var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log >>> Noting in log
-rw-------. 1 root root 189057 Jan 18 09:20 var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log-20190120
[2019-01-23 11:49:32.475956] I [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a] -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605) [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fcaa346f0e5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S30samba-set.sh --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 -o diagnostics.brick-log-level=DEBUG --gd-workdir=/var/lib/glusterd
[2019-01-23 11:49:32.483191] I [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a] -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605) [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fcaa346f0e5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 -o diagnostics.brick-log-level=DEBUG --gd-workdir=/var/lib/glusterd
[2019-01-23 11:48:59.111292] W [MSGID: 106036] [glusterd-snapshot.c:9514:glusterd_handle_snapshot_fn] 0-management: Snapshot list failed
[2019-01-23 11:50:14.112271] E [MSGID: 106026] [glusterd-snapshot.c:3962:glusterd_handle_snapshot_list] 0-management: Volume (vol_63854b105c40802bdec77290e91858ea) does not exist [Invalid argument]
[2019-01-23 11:50:14.112305] W [MSGID: 106036] [glusterd-snapshot.c:9514:glusterd_handle_snapshot_fn] 0-management: Snapshot list failed
[2019-01-23 11:50:20.322902] I [glusterd-utils.c:5994:glusterd_brick_start] 0-management: discovered already-running brick /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
[2019-01-23 11:50:20.322925] I [MSGID: 106142] [glusterd-pmap.c:297:pmap_registry_bind] 0-pmap: adding brick /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick on port 49165
[2019-01-23 11:50:20.327557] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2019-01-23 11:50:20.327586] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: nfs service is stopped
[2019-01-23 11:50:20.327604] I [MSGID: 106599] [glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so xlator is not installed
[2019-01-23 11:50:20.337735] I [MSGID: 106568] [glusterd-proc-mgmt.c:87:glusterd_proc_stop] 0-management: Stopping glustershd daemon running in pid: 69525
[2019-01-23 11:50:21.338058] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: glustershd service is stopped
[2019-01-23 11:50:21.338180] I [MSGID: 106567] [glusterd-svc-mgmt.c:203:glusterd_svc_start] 0-management: Starting glustershd service
[2019-01-23 11:50:21.348234] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2019-01-23 11:50:21.348285] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: bitd service is stopped
[2019-01-23 11:50:21.348866] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2019-01-23 11:50:21.348883] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: scrub service is stopped
[2019-01-23 11:50:22.356502] I [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a] -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605) [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fcaa346f0e5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2019-01-23 11:50:22.368845] E [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a] -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563) [0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fcaa346f0e5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8
Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.3.6:/var/lib/heketi/mounts/vg
_ca57f326195c243be2380ce4e42a4191/brick_952
d75fd193c7209c9a81acbc23a3747/brick 49157 0 Y 250
Brick 192.168.3.5:/var/lib/heketi/mounts/vg
_d5f17487744584e3652d3ca943b0b91b/brick_e15
c12cceae12c8ab7782dd57cf5b6c1/brick N/A N/A N N/A
Brick 192.168.3.15:/var/lib/heketi/mounts/v
g_462ea199185376b03e4b0317363bb88c/brick_17
36459d19e8aaa1dcb5a87f48747d04/brick 49173 0 Y 225
Self-heal Daemon on localhost N/A N/A Y 109550
Self-heal Daemon on 192.168.3.6 N/A N/A Y 52557
Self-heal Daemon on 192.168.3.15 N/A N/A Y 16946
Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8
------------------------------------------------------------------------------
There are no active volume tasks
From: "Sanju Rakonde" <srakonde@xxxxxxxxxx>
To: "Shaik Salam" <shaik.salam@xxxxxxx>
Cc: "Amar Tumballi Suryanarayan" <atumball@xxxxxxxxxx>, "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>, "Murali Kottakota" <murali.kottakota@xxxxxxx>
Date: 01/23/2019 02:15 PM
Subject: Re: [Bugs] Bricks are going offline unable to recover with heal/start force commands
"External email. Open with Caution"
Hi Shaik,
I can see below errors in glusterd logs.
[2019-01-22 09:20:17.540196] E [MSGID: 101012] [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: /var/run/gluster/vols/vol_e1aa1283d5917485d88c4a742eeff422/192.168.3.6-var-lib-heketi-mounts-vg_526f35058433c6b03130bba4e0a7dd87-brick_9e7c382e5f853d471c347bc5590359af-brick.pid
[2019-01-22 09:20:17.546408] E [MSGID: 101012] [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: /var/run/gluster/vols/vol_f0ed498d7e781d7bb896244175b31f9e/192.168.3.6-var-lib-heketi-mounts-vg_56391bec3c8bfe4fc116de7bddfc2af4-brick_47ed9e0663ad0f6f676ddd6ad7e3dcde-brick.pid
[2019-01-22 09:20:17.552575] E [MSGID: 101012] [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: /var/run/gluster/vols/vol_f387519c9b004ec14e80696db88ef0f8/192.168.3.6-var-lib-heketi-mounts-vg_56391bec3c8bfe4fc116de7bddfc2af4-brick_06ad6c73dfbf6a5fc21334f98c9973c2-brick.pid
[2019-01-22 09:20:17.558888] E [MSGID: 101012] [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: /var/run/gluster/vols/vol_f8ca343c60e6efe541fe02d16ca02a7d/192.168.3.6-var-lib-heketi-mounts-vg_526f35058433c6b03130bba4e0a7dd87-brick_525225f65753b05dfe33aeaeb9c5de39-brick.pid
[2019-01-22 09:20:17.565266] E [MSGID: 101012] [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: /var/run/gluster/vols/vol_fe882e074c0512fd9271fc2ff5a0bfe1/192.168.3.6-var-lib-heketi-mounts-vg_28708570b029e5eff0a996c453a11691-brick_d4f30d6e465a8544b759a7016fb5aab5-brick.pid
[2019-01-22 09:20:17.585926] E [MSGID: 106028] [glusterd-utils.c:8222:glusterd_brick_signal] 0-glusterd: Unable to get pid of brick process
[2019-01-22 09:20:17.617806] E [MSGID: 106028] [glusterd-utils.c:8222:glusterd_brick_signal] 0-glusterd: Unable to get pid of brick process
[2019-01-22 09:20:17.649628] E [MSGID: 101012] [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: /var/run/gluster/glustershd/glustershd.pid
[2019-01-22 09:20:17.649700] E [MSGID: 101012] [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: /var/run/gluster/glustershd/glustershd.pid
So it looks like, neither gf_is_service_running() nor glusterd_brick_signal() are able to read the pid file. That means pidfiles might be having nothing to read.
Can you please paste the contents of brick pidfiles. You can find brick pidfiles in /var/run/gluster/vols/<volname>/ or you can just run this command "for i in `ls /var/run/gluster/vols/*/*.pid`;do echo $i;cat $i;done"
On Wed, Jan 23, 2019 at 12:49 PM Shaik Salam <shaik.salam@xxxxxxx> wrote:
Hi Sanju,
Please find requested information attached logs.
Below brick is offline and try to start force/heal commands but doesn't makes up.
sh-4.2#
sh-4.2# gluster --version
glusterfs 4.1.5
sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8
Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.3.6:/var/lib/heketi/mounts/vg
_ca57f326195c243be2380ce4e42a4191/brick_952
d75fd193c7209c9a81acbc23a3747/brick 49166 0 Y 269
Brick 192.168.3.5:/var/lib/heketi/mounts/vg
_d5f17487744584e3652d3ca943b0b91b/brick_e15
c12cceae12c8ab7782dd57cf5b6c1/brick N/A N/A N N/A
Brick 192.168.3.15:/var/lib/heketi/mounts/v
g_462ea199185376b03e4b0317363bb88c/brick_17
36459d19e8aaa1dcb5a87f48747d04/brick 49173 0 Y 225
Self-heal Daemon on localhost N/A N/A Y 45826
Self-heal Daemon on 192.168.3.6 N/A N/A Y 65196
Self-heal Daemon on 192.168.3.15 N/A N/A Y 52915
Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8
------------------------------------------------------------------------------
We can see following events from when we start forcing volumes
/mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a] -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605) [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fcaa346f0e5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2019-01-21 08:22:34.555068] E [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a] -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563) [0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fcaa346f0e5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2019-01-21 08:22:53.389049] I [MSGID: 106499] [glusterd-handler.c:4314:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_3442e86b6d994a14de73f1b8c82cf0b8
[2019-01-21 08:23:25.346839] I [MSGID: 106487] [glusterd-handler.c:1486:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
We can see following events from when we heal volumes.
[2019-01-21 08:20:07.576070] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs: error returned while attempting to connect to host:(null), port:0
[2019-01-21 08:20:07.580225] I [cli-rpc-ops.c:9182:gf_cli_heal_volume_cbk] 0-cli: Received resp to heal volume
[2019-01-21 08:20:07.580326] I [input.c:31:cli_batch] 0-: Exiting with: -1
[2019-01-21 08:22:30.423311] I [cli.c:768:main] 0-cli: Started running gluster with version 4.1.5
[2019-01-21 08:22:30.463648] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-01-21 08:22:30.463718] I [socket.c:2632:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2019-01-21 08:22:30.463859] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs: error returned while attempting to connect to host:(null), port:0
[2019-01-21 08:22:33.427710] I [socket.c:2632:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2019-01-21 08:22:34.581555] I [cli-rpc-ops.c:1472:gf_cli_start_volume_cbk] 0-cli: Received resp to start volume
[2019-01-21 08:22:34.581678] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2019-01-21 08:22:53.345351] I [cli.c:768:main] 0-cli: Started running gluster with version 4.1.5
[2019-01-21 08:22:53.387992] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-01-21 08:22:53.388059] I [socket.c:2632:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2019-01-21 08:22:53.388138] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs: error returned while attempting to connect to host:(null), port:0
[2019-01-21 08:22:53.394737] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2019-01-21 08:23:25.304688] I [cli.c:768:main] 0-cli: Started running gluster with version 4.1.5
[2019-01-21 08:23:25.346319] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-01-21 08:23:25.346389] I [socket.c:2632:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2019-01-21 08:23:25.346500] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs: error returned while attempting to connect to host:(null), port:0
Enabled DEBUG mode for brick level. But nothing writing to brick log.
gluster volume set vol_3442e86b6d994a14de73f1b8c82cf0b8 diagnostics.brick-log-level DEBUG
sh-4.2# pwd
/var/log/glusterfs/bricks
sh-4.2# ls -la |grep brick_e15c12cceae12c8ab7782dd57cf5b6c1
-rw-------. 1 root root 0 Jan 20 02:46 var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log
From: Sanju Rakonde <srakonde@xxxxxxxxxx>
To: Shaik Salam <shaik.salam@xxxxxxx>
Cc: Amar Tumballi Suryanarayan <atumball@xxxxxxxxxx>, "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
Date: 01/22/2019 02:21 PM
Subject: Re: [Bugs] Bricks are going offline unable to recover with heal/start force commands
"External email. Open with Caution"
Hi Shaik,
Can you please provide us complete glusterd and cmd_history logs from all the nodes in the cluster? Also please paste output of the following commands (from all nodes):
1. gluster --version
2. gluster volume info
3. gluster volume status
4. gluster peer status
5. ps -ax | grep glusterfsd
On Tue, Jan 22, 2019 at 12:47 PM Shaik Salam <shaik.salam@xxxxxxx> wrote:
Hi Surya,
It is already customer setup and cant redeploy again.
Enabled debug for brick level log but nothing writing to it.
Can you tell me is any other ways to troubleshoot or logs to look??
From: Shaik Salam/HYD/TCS
To: "Amar Tumballi Suryanarayan" <atumball@xxxxxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
Date: 01/22/2019 12:06 PM
Subject: Re: [Bugs] Bricks are going offline unable to recover with heal/start force commands
Hi Surya,
I have enabled DEBUG mode for brick level. But nothing writing to brick log.
gluster volume set vol_3442e86b6d994a14de73f1b8c82cf0b8 diagnostics.brick-log-level DEBUG
sh-4.2# pwd
/var/log/glusterfs/bricks
sh-4.2# ls -la |grep brick_e15c12cceae12c8ab7782dd57cf5b6c1
-rw-------. 1 root root 0 Jan 20 02:46 var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log
BR
Salam
From: "Amar Tumballi Suryanarayan" <atumball@xxxxxxxxxx>
To: "Shaik Salam" <shaik.salam@xxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
Date: 01/22/2019 11:38 AM
Subject: Re: [Bugs] Bricks are going offline unable to recover with heal/start force commands
"External email. Open with Caution"
Hi Shaik,
Can you check what is there in brick logs? They are located in /var/log/glusterfs/bricks/*?
Looks like the samba hooks script failed, but that shouldn't matter in this use case.
Also, I see that you are trying to setup heketi to provision volumes, which means you may be using gluster in container usecases. If you are still in 'PoC' phase, can you give https://github.com/gluster/gcs a try? That makes the deployment and the stack little simpler.
-Amar
On Tue, Jan 22, 2019 at 11:29 AM Shaik Salam <shaik.salam@xxxxxxx> wrote:
Can anyone respond how to recover bricks apart from heal/start force according to below events from logs.
Please let me know any other logs required.
Thanks in advance.
BR
Salam
From: Shaik Salam/HYD/TCS
To: bugs@xxxxxxxxxxx, gluster-users@xxxxxxxxxxx
Date: 01/21/2019 10:03 PM
Subject: Bricks are going offline unable to recover with heal/start force commands
Hi,
Bricks are in offline and unable to recover with following commands
gluster volume heal <vol-name>
gluster volume start <vol-name> force
But still bricks are offline.
sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8
Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.3.6:/var/lib/heketi/mounts/vg
_ca57f326195c243be2380ce4e42a4191/brick_952
d75fd193c7209c9a81acbc23a3747/brick 49166 0 Y 269
Brick 192.168.3.5:/var/lib/heketi/mounts/vg
_d5f17487744584e3652d3ca943b0b91b/brick_e15
c12cceae12c8ab7782dd57cf5b6c1/brick N/A N/A N N/A
Brick 192.168.3.15:/var/lib/heketi/mounts/v
g_462ea199185376b03e4b0317363bb88c/brick_17
36459d19e8aaa1dcb5a87f48747d04/brick 49173 0 Y 225
Self-heal Daemon on localhost N/A N/A Y 45826
Self-heal Daemon on 192.168.3.6 N/A N/A Y 65196
Self-heal Daemon on 192.168.3.15 N/A N/A Y 52915
Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8
------------------------------------------------------------------------------
We can see following events from when we start forcing volumes
/mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a] -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605) [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fcaa346f0e5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2019-01-21 08:22:34.555068] E [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a] -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563) [0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fcaa346f0e5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2019-01-21 08:22:53.389049] I [MSGID: 106499] [glusterd-handler.c:4314:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_3442e86b6d994a14de73f1b8c82cf0b8
[2019-01-21 08:23:25.346839] I [MSGID: 106487] [glusterd-handler.c:1486:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
We can see following events from when we heal volumes.
[2019-01-21 08:20:07.576070] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs: error returned while attempting to connect to host:(null), port:0
[2019-01-21 08:20:07.580225] I [cli-rpc-ops.c:9182:gf_cli_heal_volume_cbk] 0-cli: Received resp to heal volume
[2019-01-21 08:20:07.580326] I [input.c:31:cli_batch] 0-: Exiting with: -1
[2019-01-21 08:22:30.423311] I [cli.c:768:main] 0-cli: Started running gluster with version 4.1.5
[2019-01-21 08:22:30.463648] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-01-21 08:22:30.463718] I [socket.c:2632:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2019-01-21 08:22:30.463859] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs: error returned while attempting to connect to host:(null), port:0
[2019-01-21 08:22:33.427710] I [socket.c:2632:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2019-01-21 08:22:34.581555] I [cli-rpc-ops.c:1472:gf_cli_start_volume_cbk] 0-cli: Received resp to start volume
[2019-01-21 08:22:34.581678] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2019-01-21 08:22:53.345351] I [cli.c:768:main] 0-cli: Started running gluster with version 4.1.5
[2019-01-21 08:22:53.387992] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-01-21 08:22:53.388059] I [socket.c:2632:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2019-01-21 08:22:53.388138] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs: error returned while attempting to connect to host:(null), port:0
[2019-01-21 08:22:53.394737] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2019-01-21 08:23:25.304688] I [cli.c:768:main] 0-cli: Started running gluster with version 4.1.5
[2019-01-21 08:23:25.346319] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-01-21 08:23:25.346389] I [socket.c:2632:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2019-01-21 08:23:25.346500] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs: error returned while attempting to connect to host:(null), port:0
Please let us know steps to recover bricks.
BR
Salam
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
_______________________________________________
Bugs mailing list
Bugs@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/bugs
--
Amar Tumballi (amarts)
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Sanju
--
Thanks,
Sanju
Attachment:
firsnode_brick.log
Description: Binary data
Attachment:
Secondnode_brick.log
Description: Binary data
Attachment:
Thirdnode_brick.log
Description: Binary data
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users