Re: [Bugs] Bricks are going offline unable to recover with heal/start force commands

Shaik Salam <shaik.salam@xxxxxxx> · Thu, 24 Jan 2019 14:05:04 +0530

Hi Sanju,

Please find requsted information.

Are you still seeing the error "Unable to read pidfile:"
in glusterd log?  >>>>
 No

Are you seeing "brick is deemed not to be a part
of the volume" error in glusterd log?>>>>
No

sh-4.2# getfattr -m -d -e hex /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick

sh-4.2# getfattr -m -d -e hex /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae1^C8ab7782dd57cf5b6c1/brick

sh-4.2# pwd

/var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick

sh-4.2# getfattr -m -d -e hex /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick

sh-4.2# getfattr -m -d -e hex /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/

sh-4.2# getfattr -m -d -e hex /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/

sh-4.2# getfattr -m -d -e hex /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/

sh-4.2# getfattr -d -m . -e hex /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/

getfattr: Removing leading '/' from
absolute path names

# file: var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/

security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000

trusted.afr.dirty=0x000000000000000000000000

trusted.afr.vol_3442e86b6d994a14de73f1b8c82cf0b8-client-0=0x000000000000000000000000

trusted.gfid=0x00000000000000000000000000000001

trusted.glusterfs.dht=0x000000010000000000000000ffffffff

trusted.glusterfs.volume-id=0x15477f3622e84757a0ce9000b63fa849

sh-4.2# ls -la |wc -l

86

sh-4.2# pwd

/var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick

sh-4.2#

From:      
 "Sanju Rakonde"
<srakonde@xxxxxxxxxx>

To:      
 "Shaik Salam"
<shaik.salam@xxxxxxx>

Cc:      
 "Amar Tumballi
Suryanarayan" <atumball@xxxxxxxxxx>, "gluster-users@xxxxxxxxxxx
List" <gluster-users@xxxxxxxxxxx>, "Murali Kottakota"
<murali.kottakota@xxxxxxx>

Date:      
 01/24/2019 01:38 PM

Subject:    
   Re: 
[Bugs] Bricks are going offline unable to recover with heal/start force
commands

"External email. Open with Caution"

Shaik,

Previously I was suspecting, whether brick pid file is
missing. But I see it is present.

From second node (this brick is in offline state):

/var/run/gluster/vols/vol_3442e86b6d994a14de73f1b8c82cf0b8/192.168.3.5-var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.pid

271

 Are you still seeing the error "Unable to read
pidfile:" in glusterd log?

I also suspect whether brick is missing its extended attributes.
Are you seeing "brick is deemed not to be a part of the volume"
error in glusterd log? If not can you please provide us output of 
"getfattr -m -d -e hex <brickpath>"

On Thu, Jan 24, 2019 at 12:18 PM Shaik Salam <shaik.salam@xxxxxxx>
wrote:

Hi Sanju, 

Could you please have look my issue if you have time (atleast provide workaround).

BR 

Salam 

From:        Shaik
Salam/HYD/TCS 

To:        "Sanju
Rakonde" <srakonde@xxxxxxxxxx>

Cc:        "Amar
Tumballi Suryanarayan" <atumball@xxxxxxxxxx>,
"gluster-users@xxxxxxxxxxx
List" <gluster-users@xxxxxxxxxxx>,
"Murali Kottakota" <murali.kottakota@xxxxxxx>

Date:        01/23/2019
05:50 PM 

Subject:        Re:
 [Bugs] Bricks are going offline unable to recover with
heal/start force commands 

Hi Sanju, 

Please find requested information. 

Sorry to repeat again I am trying start force command once brick log enabled
to debug by taking one volume example. 

Please correct me If I am doing wrong. 

[root@master ~]# oc rsh glusterfs-storage-vll7x 

sh-4.2# gluster volume info vol_3442e86b6d994a14de73f1b8c82cf0b8

Volume Name: vol_3442e86b6d994a14de73f1b8c82cf0b8

Type: Replicate 

Volume ID: 15477f36-22e8-4757-a0ce-9000b63fa849 

Status: Started 

Snapshot Count: 0 

Number of Bricks: 1 x 3 = 3 

Transport-type: tcp 

Bricks: 

Brick1: 192.168.3.6:/var/lib/heketi/mounts/vg_ca57f326195c243be2380ce4e42a4191/brick_952d75fd193c7209c9a81acbc23a3747/brick

Brick2: 192.168.3.5:/var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick

Brick3: 192.168.3.15:/var/lib/heketi/mounts/vg_462ea199185376b03e4b0317363bb88c/brick_1736459d19e8aaa1dcb5a87f48747d04/brick

Options Reconfigured: 

diagnostics.brick-log-level: INFO 

performance.client-io-threads: off 

nfs.disable: on 

transport.address-family: inet 

sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8

Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8

Gluster process                
            TCP Port  RDMA Port  Online
 Pid 

------------------------------------------------------------------------------

Brick 192.168.3.6:/var/lib/heketi/mounts/vg 

_ca57f326195c243be2380ce4e42a4191/brick_952 

d75fd193c7209c9a81acbc23a3747/brick         49157  
  0          Y       250

Brick 192.168.3.5:/var/lib/heketi/mounts/vg 

_d5f17487744584e3652d3ca943b0b91b/brick_e15 

c12cceae12c8ab7782dd57cf5b6c1/brick         N/A  
    N/A        N       N/A

Brick 192.168.3.15:/var/lib/heketi/mounts/v 

g_462ea199185376b03e4b0317363bb88c/brick_17 

36459d19e8aaa1dcb5a87f48747d04/brick        49173  
  0          Y       225

Self-heal Daemon on localhost            
  N/A       N/A        Y  
    108434 

Self-heal Daemon on matrix1.matrix.orange.l 

ab                    
                     N/A
      N/A        Y      
69525 

Self-heal Daemon on matrix2.matrix.orange.l 

ab                    
                     N/A
      N/A        Y      
18569 

gluster volume set vol_3442e86b6d994a14de73f1b8c82cf0b8
diagnostics.brick-log-level DEBUG 

volume set: success 

sh-4.2# gluster volume get vol_3442e86b6d994a14de73f1b8c82cf0b8 all |grep
log 

cluster.entry-change-log              
 on 

cluster.data-change-log              
  on 

cluster.metadata-change-log             on

diagnostics.brick-log-level             DEBUG

sh-4.2# cd /var/log/glusterfs/bricks/ 

sh-4.2# ls -la |grep brick_e15c12cceae12c8ab7782dd57cf5b6c1

-rw-------. 1 root root       0 Jan 20 02:46  

 var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log
 >>> Noting
in log 

-rw-------. 1 root root  189057 Jan 18 09:20 var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log-20190120

[2019-01-23 11:49:32.475956] I [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
[0x7fca9e139b3a] -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605)
[0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fcaa346f0e5]
) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S30samba-set.sh
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 -o diagnostics.brick-log-level=DEBUG
--gd-workdir=/var/lib/glusterd 

[2019-01-23 11:49:32.483191] I [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
[0x7fca9e139b3a] -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605)
[0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fcaa346f0e5]
) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 -o diagnostics.brick-log-level=DEBUG
--gd-workdir=/var/lib/glusterd 

[2019-01-23 11:48:59.111292] W [MSGID: 106036] [glusterd-snapshot.c:9514:glusterd_handle_snapshot_fn]
0-management: Snapshot list failed 

[2019-01-23 11:50:14.112271] E [MSGID: 106026] [glusterd-snapshot.c:3962:glusterd_handle_snapshot_list]
0-management: Volume (vol_63854b105c40802bdec77290e91858ea) does not exist
[Invalid argument] 

[2019-01-23 11:50:14.112305] W [MSGID: 106036] [glusterd-snapshot.c:9514:glusterd_handle_snapshot_fn]
0-management: Snapshot list failed 

[2019-01-23 11:50:20.322902] I [glusterd-utils.c:5994:glusterd_brick_start]
0-management: discovered already-running brick /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick

[2019-01-23 11:50:20.322925] I [MSGID: 106142] [glusterd-pmap.c:297:pmap_registry_bind]
0-pmap: adding brick /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
on port 49165 

[2019-01-23 11:50:20.327557] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop]
0-management: nfs already stopped 

[2019-01-23 11:50:20.327586] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop]
0-management: nfs service is stopped 

[2019-01-23 11:50:20.327604] I [MSGID: 106599] [glusterd-nfs-svc.c:82:glusterd_nfssvc_manager]
0-management: nfs/server.so xlator is not installed

[2019-01-23 11:50:20.337735] I [MSGID: 106568] [glusterd-proc-mgmt.c:87:glusterd_proc_stop]
0-management: Stopping glustershd daemon running in pid: 69525

[2019-01-23 11:50:21.338058] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop]
0-management: glustershd service is stopped 

[2019-01-23 11:50:21.338180] I [MSGID: 106567] [glusterd-svc-mgmt.c:203:glusterd_svc_start]
0-management: Starting glustershd service 

[2019-01-23 11:50:21.348234] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop]
0-management: bitd already stopped 

[2019-01-23 11:50:21.348285] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop]
0-management: bitd service is stopped 

[2019-01-23 11:50:21.348866] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop]
0-management: scrub already stopped 

[2019-01-23 11:50:21.348883] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop]
0-management: scrub service is stopped 

[2019-01-23 11:50:22.356502] I [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
[0x7fca9e139b3a] -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605)
[0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fcaa346f0e5]
) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 --volume-op=start
--gd-workdir=/var/lib/glusterd 

[2019-01-23 11:50:22.368845] E [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
[0x7fca9e139b3a] -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563)
[0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fcaa346f0e5]
) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 --volume-op=start
--gd-workdir=/var/lib/glusterd

sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8

Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8

Gluster process                
            TCP Port  RDMA Port  Online
 Pid 

------------------------------------------------------------------------------

Brick 192.168.3.6:/var/lib/heketi/mounts/vg 

_ca57f326195c243be2380ce4e42a4191/brick_952 

d75fd193c7209c9a81acbc23a3747/brick         49157  
  0          Y       250

Brick 192.168.3.5:/var/lib/heketi/mounts/vg 

_d5f17487744584e3652d3ca943b0b91b/brick_e15 

c12cceae12c8ab7782dd57cf5b6c1/brick         N/A  
    N/A        N       N/A

Brick 192.168.3.15:/var/lib/heketi/mounts/v 

g_462ea199185376b03e4b0317363bb88c/brick_17 

36459d19e8aaa1dcb5a87f48747d04/brick        49173  
  0          Y       225

Self-heal Daemon on localhost            
  N/A       N/A        Y  
    109550 

Self-heal Daemon on 192.168.3.6            
N/A       N/A        Y    
  52557 

Self-heal Daemon on 192.168.3.15            N/A
      N/A        Y      
16946 

Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8

------------------------------------------------------------------------------

There are no active volume tasks 

From:        "Sanju
Rakonde" <srakonde@xxxxxxxxxx>

To:        "Shaik
Salam" <shaik.salam@xxxxxxx>

Cc:        "Amar
Tumballi Suryanarayan" <atumball@xxxxxxxxxx>,
"gluster-users@xxxxxxxxxxx
List" <gluster-users@xxxxxxxxxxx>,
"Murali Kottakota" <murali.kottakota@xxxxxxx>

Date:        01/23/2019
02:15 PM 

Subject:        Re:
 [Bugs] Bricks are going offline unable to recover with
heal/start force commands 

"External email. Open with Caution" 

Hi Shaik, 

I can see below errors in glusterd logs. 

[2019-01-22 09:20:17.540196] E [MSGID: 101012] [common-utils.c:4010:gf_is_service_running]
0-: Unable to read pidfile: /var/run/gluster/vols/vol_e1aa1283d5917485d88c4a742eeff422/192.168.3.6-var-lib-heketi-mounts-vg_526f35058433c6b03130bba4e0a7dd87-brick_9e7c382e5f853d471c347bc5590359af-brick.pid

[2019-01-22 09:20:17.546408] E [MSGID: 101012] [common-utils.c:4010:gf_is_service_running]
0-: Unable to read pidfile: /var/run/gluster/vols/vol_f0ed498d7e781d7bb896244175b31f9e/192.168.3.6-var-lib-heketi-mounts-vg_56391bec3c8bfe4fc116de7bddfc2af4-brick_47ed9e0663ad0f6f676ddd6ad7e3dcde-brick.pid

[2019-01-22 09:20:17.552575] E [MSGID: 101012] [common-utils.c:4010:gf_is_service_running]
0-: Unable to read pidfile: /var/run/gluster/vols/vol_f387519c9b004ec14e80696db88ef0f8/192.168.3.6-var-lib-heketi-mounts-vg_56391bec3c8bfe4fc116de7bddfc2af4-brick_06ad6c73dfbf6a5fc21334f98c9973c2-brick.pid

[2019-01-22 09:20:17.558888] E [MSGID: 101012] [common-utils.c:4010:gf_is_service_running]
0-: Unable to read pidfile: /var/run/gluster/vols/vol_f8ca343c60e6efe541fe02d16ca02a7d/192.168.3.6-var-lib-heketi-mounts-vg_526f35058433c6b03130bba4e0a7dd87-brick_525225f65753b05dfe33aeaeb9c5de39-brick.pid

[2019-01-22 09:20:17.565266] E [MSGID: 101012] [common-utils.c:4010:gf_is_service_running]
0-: Unable to read pidfile: /var/run/gluster/vols/vol_fe882e074c0512fd9271fc2ff5a0bfe1/192.168.3.6-var-lib-heketi-mounts-vg_28708570b029e5eff0a996c453a11691-brick_d4f30d6e465a8544b759a7016fb5aab5-brick.pid

[2019-01-22 09:20:17.585926] E [MSGID: 106028] [glusterd-utils.c:8222:glusterd_brick_signal]
0-glusterd: Unable to get pid of brick process 

[2019-01-22 09:20:17.617806] E [MSGID: 106028] [glusterd-utils.c:8222:glusterd_brick_signal]
0-glusterd: Unable to get pid of brick process 

[2019-01-22 09:20:17.649628] E [MSGID: 101012] [common-utils.c:4010:gf_is_service_running]
0-: Unable to read pidfile: /var/run/gluster/glustershd/glustershd.pid

[2019-01-22 09:20:17.649700] E [MSGID: 101012] [common-utils.c:4010:gf_is_service_running]
0-: Unable to read pidfile: /var/run/gluster/glustershd/glustershd.pid

So it looks like, neither gf_is_service_running() nor glusterd_brick_signal()
are able to read the pid file. That means pidfiles might be having nothing
to read. 

Can you please paste the contents of brick pidfiles. You can find brick
pidfiles in /var/run/gluster/vols/<volname>/ or you can just run
this command "for i in `ls /var/run/gluster/vols/*/*.pid`;do echo
$i;cat $i;done" 

On Wed, Jan 23, 2019 at 12:49 PM Shaik Salam <shaik.salam@xxxxxxx>
wrote: 

Hi Sanju, 

Please find requested information attached logs. 

Below brick is offline and try to start force/heal commands but doesn't
makes up. 

sh-4.2# 

sh-4.2# gluster --version 

glusterfs 4.1.5 

sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8

Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8

Gluster process                
            TCP Port  RDMA Port  Online
 Pid 

------------------------------------------------------------------------------

Brick 192.168.3.6:/var/lib/heketi/mounts/vg 

_ca57f326195c243be2380ce4e42a4191/brick_952 

d75fd193c7209c9a81acbc23a3747/brick         49166  
  0          Y       269

Brick 192.168.3.5:/var/lib/heketi/mounts/vg 

_d5f17487744584e3652d3ca943b0b91b/brick_e15 

c12cceae12c8ab7782dd57cf5b6c1/brick         N/A  
    N/A        N       N/A

Brick 192.168.3.15:/var/lib/heketi/mounts/v 

g_462ea199185376b03e4b0317363bb88c/brick_17 

36459d19e8aaa1dcb5a87f48747d04/brick        49173  
  0          Y       225

Self-heal Daemon on localhost            
  N/A       N/A        Y  
    45826 

Self-heal Daemon on 192.168.3.6            
N/A       N/A        Y    
  65196 

Self-heal Daemon on 192.168.3.15            N/A
      N/A        Y      
52915 

Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8

------------------------------------------------------------------------------

We can see following events from when we start forcing volumes

/mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a] -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605)
[0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fcaa346f0e5]
) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 --volume-op=start
--gd-workdir=/var/lib/glusterd 

[2019-01-21 08:22:34.555068] E [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
[0x7fca9e139b3a] -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563)
[0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fcaa346f0e5]
) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 --volume-op=start
--gd-workdir=/var/lib/glusterd 

[2019-01-21 08:22:53.389049] I [MSGID: 106499] [glusterd-handler.c:4314:__glusterd_handle_status_volume]
0-management: Received status volume req for volume vol_3442e86b6d994a14de73f1b8c82cf0b8

[2019-01-21 08:23:25.346839] I [MSGID: 106487] [glusterd-handler.c:1486:__glusterd_handle_cli_list_friends]
0-glusterd: Received cli list req 

We can see following events from when we heal volumes.

[2019-01-21 08:20:07.576070] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs:
error returned while attempting to connect to host:(null), port:0

[2019-01-21 08:20:07.580225] I [cli-rpc-ops.c:9182:gf_cli_heal_volume_cbk]
0-cli: Received resp to heal volume 

[2019-01-21 08:20:07.580326] I [input.c:31:cli_batch] 0-: Exiting with:
-1 

[2019-01-21 08:22:30.423311] I [cli.c:768:main] 0-cli: Started running
gluster with version 4.1.5 

[2019-01-21 08:22:30.463648] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker]
0-epoll: Started thread with index 1 

[2019-01-21 08:22:30.463718] I [socket.c:2632:socket_event_handler] 0-transport:
EPOLLERR - disconnecting now 

[2019-01-21 08:22:30.463859] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs:
error returned while attempting to connect to host:(null), port:0

[2019-01-21 08:22:33.427710] I [socket.c:2632:socket_event_handler] 0-transport:
EPOLLERR - disconnecting now 

[2019-01-21 08:22:34.581555] I [cli-rpc-ops.c:1472:gf_cli_start_volume_cbk]
0-cli: Received resp to start volume 

[2019-01-21 08:22:34.581678] I [input.c:31:cli_batch] 0-: Exiting with:
0 

[2019-01-21 08:22:53.345351] I [cli.c:768:main] 0-cli: Started running
gluster with version 4.1.5 

[2019-01-21 08:22:53.387992] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker]
0-epoll: Started thread with index 1 

[2019-01-21 08:22:53.388059] I [socket.c:2632:socket_event_handler] 0-transport:
EPOLLERR - disconnecting now 

[2019-01-21 08:22:53.388138] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs:
error returned while attempting to connect to host:(null), port:0

[2019-01-21 08:22:53.394737] I [input.c:31:cli_batch] 0-: Exiting with:
0 

[2019-01-21 08:23:25.304688] I [cli.c:768:main] 0-cli: Started running
gluster with version 4.1.5 

[2019-01-21 08:23:25.346319] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker]
0-epoll: Started thread with index 1 

[2019-01-21 08:23:25.346389] I [socket.c:2632:socket_event_handler] 0-transport:
EPOLLERR - disconnecting now 

[2019-01-21 08:23:25.346500] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs:
error returned while attempting to connect to host:(null), port:0

Enabled DEBUG mode for brick level. But nothing writing to brick log.

gluster volume set vol_3442e86b6d994a14de73f1b8c82cf0b8 diagnostics.brick-log-level
DEBUG 

sh-4.2# pwd 

/var/log/glusterfs/bricks 

sh-4.2# ls -la |grep brick_e15c12cceae12c8ab7782dd57cf5b6c1

-rw-------. 1 root root       0
Jan 20 02:46 var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log

From:        Sanju
Rakonde <srakonde@xxxxxxxxxx>

To:        Shaik
Salam <shaik.salam@xxxxxxx>

Cc:        Amar
Tumballi Suryanarayan <atumball@xxxxxxxxxx>,
"gluster-users@xxxxxxxxxxx
List" <gluster-users@xxxxxxxxxxx>

Date:        01/22/2019
02:21 PM 

Subject:        Re:
 [Bugs] Bricks are going offline unable to recover with
heal/start force commands 

"External email. Open with Caution" 

Hi Shaik, 

Can you please provide us complete glusterd and cmd_history logs from all
the nodes in the cluster? Also please paste output of the following commands
(from all nodes): 

1. gluster --version 

2. gluster volume info 

3. gluster volume status 

4. gluster peer status 

5. ps -ax | grep glusterfsd 

On Tue, Jan 22, 2019 at 12:47 PM Shaik Salam <shaik.salam@xxxxxxx>
wrote: 

Hi Surya, 

It is already customer setup and cant redeploy again.

Enabled debug for brick level log but nothing writing to it.

Can you tell me is any other ways to troubleshoot  or logs to look??

From:        Shaik
Salam/HYD/TCS 

To:        "Amar
Tumballi Suryanarayan" <atumball@xxxxxxxxxx>

Cc:        "gluster-users@xxxxxxxxxxx
List" <gluster-users@xxxxxxxxxxx>

Date:        01/22/2019
12:06 PM 

Subject:        Re:
[Bugs] Bricks are going offline unable to recover with heal/start force
commands 

Hi Surya, 

I have enabled DEBUG mode for brick level. But nothing writing to brick
log. 

gluster volume set vol_3442e86b6d994a14de73f1b8c82cf0b8 diagnostics.brick-log-level
DEBUG 

sh-4.2# pwd 

/var/log/glusterfs/bricks 

sh-4.2# ls -la |grep brick_e15c12cceae12c8ab7782dd57cf5b6c1

-rw-------. 1 root root       0
Jan 20 02:46 var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log

BR 

Salam 

From:        "Amar
Tumballi Suryanarayan" <atumball@xxxxxxxxxx>

To:        "Shaik
Salam" <shaik.salam@xxxxxxx>

Cc:        "gluster-users@xxxxxxxxxxx
List" <gluster-users@xxxxxxxxxxx>

Date:        01/22/2019
11:38 AM 

Subject:        Re:
[Bugs] Bricks are going offline unable to recover with heal/start force
commands 

"External email. Open with Caution" 

Hi Shaik, 

Can you check what is there in brick logs? They are located in /var/log/glusterfs/bricks/*? 

Looks like the samba hooks script failed, but that shouldn't matter in
this use case. 

Also, I see that you are trying to setup heketi to provision volumes, which
means you may be using gluster in container usecases. If you are still
in 'PoC' phase, can you give https://github.com/gluster/gcs
a try? That makes the deployment and the stack little simpler. 

-Amar 

On Tue, Jan 22, 2019 at 11:29 AM Shaik Salam <shaik.salam@xxxxxxx>
wrote: 

Can anyone respond how to recover bricks apart from heal/start force according
to below events from logs. 

Please let me know any other logs required. 

Thanks in advance. 

BR 

Salam 

From:        Shaik
Salam/HYD/TCS 

To:        bugs@xxxxxxxxxxx,
gluster-users@xxxxxxxxxxx

Date:        01/21/2019
10:03 PM 

Subject:        Bricks
are going offline unable to recover with heal/start force commands

Hi, 

Bricks are in offline and  unable to recover with following commands

gluster volume heal <vol-name> 

gluster volume start <vol-name> force 

But still bricks are offline. 

sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8

Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8

Gluster process                
            TCP Port  RDMA Port  Online
 Pid 

------------------------------------------------------------------------------

Brick 192.168.3.6:/var/lib/heketi/mounts/vg 

_ca57f326195c243be2380ce4e42a4191/brick_952 

d75fd193c7209c9a81acbc23a3747/brick         49166  
  0          Y       269

Brick 192.168.3.5:/var/lib/heketi/mounts/vg 

_d5f17487744584e3652d3ca943b0b91b/brick_e15 

c12cceae12c8ab7782dd57cf5b6c1/brick         N/A  
    N/A        N       N/A

Brick 192.168.3.15:/var/lib/heketi/mounts/v 

g_462ea199185376b03e4b0317363bb88c/brick_17 

36459d19e8aaa1dcb5a87f48747d04/brick        49173  
  0          Y       225

Self-heal Daemon on localhost            
  N/A       N/A        Y  
    45826 

Self-heal Daemon on 192.168.3.6            
N/A       N/A        Y    
  65196 

Self-heal Daemon on 192.168.3.15            N/A
      N/A        Y      
52915 

Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8

------------------------------------------------------------------------------

We can see following events from when we start forcing volumes

/mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a] -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605)
[0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fcaa346f0e5]
) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 --volume-op=start
--gd-workdir=/var/lib/glusterd 

[2019-01-21 08:22:34.555068] E [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
[0x7fca9e139b3a] -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563)
[0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fcaa346f0e5]
) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 --volume-op=start
--gd-workdir=/var/lib/glusterd 

[2019-01-21 08:22:53.389049] I [MSGID: 106499] [glusterd-handler.c:4314:__glusterd_handle_status_volume]
0-management: Received status volume req for volume vol_3442e86b6d994a14de73f1b8c82cf0b8

[2019-01-21 08:23:25.346839] I [MSGID: 106487] [glusterd-handler.c:1486:__glusterd_handle_cli_list_friends]
0-glusterd: Received cli list req 

We can see following events from when we heal volumes.

[2019-01-21 08:20:07.576070] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs:
error returned while attempting to connect to host:(null), port:0

[2019-01-21 08:20:07.580225] I [cli-rpc-ops.c:9182:gf_cli_heal_volume_cbk]
0-cli: Received resp to heal volume 

[2019-01-21 08:20:07.580326] I [input.c:31:cli_batch] 0-: Exiting with:
-1 

[2019-01-21 08:22:30.423311] I [cli.c:768:main] 0-cli: Started running
gluster with version 4.1.5 

[2019-01-21 08:22:30.463648] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker]
0-epoll: Started thread with index 1 

[2019-01-21 08:22:30.463718] I [socket.c:2632:socket_event_handler] 0-transport:
EPOLLERR - disconnecting now 

[2019-01-21 08:22:30.463859] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs:
error returned while attempting to connect to host:(null), port:0

[2019-01-21 08:22:33.427710] I [socket.c:2632:socket_event_handler] 0-transport:
EPOLLERR - disconnecting now 

[2019-01-21 08:22:34.581555] I [cli-rpc-ops.c:1472:gf_cli_start_volume_cbk]
0-cli: Received resp to start volume 

[2019-01-21 08:22:34.581678] I [input.c:31:cli_batch] 0-: Exiting with:
0 

[2019-01-21 08:22:53.345351] I [cli.c:768:main] 0-cli: Started running
gluster with version 4.1.5 

[2019-01-21 08:22:53.387992] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker]
0-epoll: Started thread with index 1 

[2019-01-21 08:22:53.388059] I [socket.c:2632:socket_event_handler] 0-transport:
EPOLLERR - disconnecting now 

[2019-01-21 08:22:53.388138] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs:
error returned while attempting to connect to host:(null), port:0

[2019-01-21 08:22:53.394737] I [input.c:31:cli_batch] 0-: Exiting with:
0 

[2019-01-21 08:23:25.304688] I [cli.c:768:main] 0-cli: Started running
gluster with version 4.1.5 

[2019-01-21 08:23:25.346319] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker]
0-epoll: Started thread with index 1 

[2019-01-21 08:23:25.346389] I [socket.c:2632:socket_event_handler] 0-transport:
EPOLLERR - disconnecting now 

[2019-01-21 08:23:25.346500] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs:
error returned while attempting to connect to host:(null), port:0

Please let us know steps to recover bricks. 

BR 

Salam 

=====-----=====-----=====

Notice: The information contained in this e-mail

message and/or attachments to it may contain 

confidential or privileged information. If you are 

not the intended recipient, any dissemination, use, 

review, distribution, printing or copying of the 

information contained in this e-mail message 

and/or attachments to it are strictly prohibited. If 

you have received this communication in error, 

please notify us by reply e-mail or telephone and 

immediately and permanently delete the message 

and any attachments. Thank you 

_______________________________________________

Bugs mailing list

Bugs@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/bugs

-- 

Amar Tumballi (amarts) 

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-users

-- 

Thanks, 

Sanju 

-- 

Thanks, 

Sanju 

-- 

Thanks,

Sanju

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users