Amudhan,
I see that you have provided the content of the configuration of the volume gfs-tst where the request was to share the dump of /var/lib/glusterd/* . I can not debug this further until you share the correct dump.
On Thu, Jan 17, 2019 at 3:43 PM Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:
Can you please run 'glusterd -LDEBUG' and share back the glusterd.log? Instead of doing too many back and forth I suggest you to share the content of /var/lib/glusterd from all the nodes. Also do mention which particular node the glusterd service is unable to come up.On Thu, Jan 17, 2019 at 11:34 AM Amudhan P <amudhan83@xxxxxxxxx> wrote:I have created the folder in the path as said but still, service failed to start below is the error msg in glusterd.log[2019-01-16 14:50:14.555742] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)[2019-01-16 14:50:14.559835] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536[2019-01-16 14:50:14.559894] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory[2019-01-16 14:50:14.559912] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory[2019-01-16 14:50:14.563834] W [MSGID: 103071] [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device][2019-01-16 14:50:14.563867] W [MSGID: 103055] [rdma.c:4938:init] 0-rdma.management: Failed to initialize IB Device[2019-01-16 14:50:14.563882] W [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed[2019-01-16 14:50:14.563957] W [rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed[2019-01-16 14:50:14.563974] E [MSGID: 106244] [glusterd.c:1764:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport[2019-01-16 14:50:15.565868] I [MSGID: 106513] [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 40100[2019-01-16 14:50:15.642532] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: d6bf51a7-c296-492f-8dac-e81efa9dd22d[2019-01-16 14:50:15.675333] I [MSGID: 106498] [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0[2019-01-16 14:50:15.675421] W [MSGID: 106061] [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout[2019-01-16 14:50:15.675451] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600[2019-01-16 14:50:15.676912] E [MSGID: 106187] [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore[2019-01-16 14:50:15.676956] E [MSGID: 101019] [xlator.c:720:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again[2019-01-16 14:50:15.676973] E [MSGID: 101066] [graph.c:367:glusterfs_graph_init] 0-management: initializing translator failed[2019-01-16 14:50:15.676986] E [MSGID: 101176] [graph.c:738:glusterfs_graph_activate] 0-graph: init failed[2019-01-16 14:50:15.677479] W [glusterfsd.c:1514:cleanup_and_exit] (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52] -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41] -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-: received signum (-1), shutting downOn Thu, Jan 17, 2019 at 8:06 AM Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:If gluster volume info/status shows the brick to be /media/disk4/brick4 then you'd need to mount the same path and hence you'd need to create the brick4 directory explicitly. I fail to understand the rationale how only /media/disk4 can be used as the mount path for the brick.On Wed, Jan 16, 2019 at 5:24 PM Amudhan P <amudhan83@xxxxxxxxx> wrote:Yes, I did mount bricks but the folder 'brick4' was still not created inside the brick.Do I need to create this folder because when I run replace-brick it will create folder inside the brick. I have seen this behavior before when running replace-brick or heal begins.On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:On Wed, Jan 16, 2019 at 5:02 PM Amudhan P <amudhan83@xxxxxxxxx> wrote:Atin,I have copied the content of 'gfs-tst' from vol folder in another node. when starting service again fails with error msg in glusterd.log file.[2019-01-15 20:16:59.513023] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)[2019-01-15 20:16:59.517164] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536[2019-01-15 20:16:59.517264] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory[2019-01-15 20:16:59.517283] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory[2019-01-15 20:16:59.521508] W [MSGID: 103071] [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device][2019-01-15 20:16:59.521544] W [MSGID: 103055] [rdma.c:4938:init] 0-rdma.management: Failed to initialize IB Device[2019-01-15 20:16:59.521562] W [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed[2019-01-15 20:16:59.521629] W [rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed[2019-01-15 20:16:59.521648] E [MSGID: 106244] [glusterd.c:1764:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport[2019-01-15 20:17:00.529390] I [MSGID: 106513] [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 40100[2019-01-15 20:17:00.608354] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: d6bf51a7-c296-492f-8dac-e81efa9dd22d[2019-01-15 20:17:00.650911] W [MSGID: 106425] [glusterd-store.c:2643:glusterd_store_retrieve_bricks] 0-management: failed to get statfs() call on brick /media/disk4/brick4 [No such file or directory]This means that underlying brick /media/disk4/brick4 doesn't exist. You already mentioned that you had replaced the faulty disk, but have you not mounted it yet?[2019-01-15 20:17:00.691240] I [MSGID: 106498] [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0[2019-01-15 20:17:00.691307] W [MSGID: 106061] [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout[2019-01-15 20:17:00.691331] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600[2019-01-15 20:17:00.692547] E [MSGID: 106187] [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore[2019-01-15 20:17:00.692582] E [MSGID: 101019] [xlator.c:720:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again[2019-01-15 20:17:00.692597] E [MSGID: 101066] [graph.c:367:glusterfs_graph_init] 0-management: initializing translator failed[2019-01-15 20:17:00.692607] E [MSGID: 101176] [graph.c:738:glusterfs_graph_activate] 0-graph: init failed[2019-01-15 20:17:00.693004] W [glusterfsd.c:1514:cleanup_and_exit] (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52] -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41] -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-: received signum (-1), shutting downOn Wed, Jan 16, 2019 at 4:34 PM Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:This is a case of partial write of a transaction and as the host ran out of space for the root partition where all the glusterd related configurations are persisted, the transaction couldn't be written and hence the new (replaced) brick's information wasn't persisted in the configuration. The workaround for this is to copy the content of /var/lib/glusterd/vols/gfs-tst/ from one of the nodes in the trusted storage pool to the node where glusterd service fails to come up and post that restarting the glusterd service should be able to make peer status reporting all nodes healthy and connected.On Wed, Jan 16, 2019 at 3:49 PM Amudhan P <amudhan83@xxxxxxxxx> wrote:_______________________________________________Hi,In short, when I started glusterd service I am getting following error msg in the glusterd.log file in one server.what needs to be done?error logged in glusterd.log[2019-01-15 17:50:13.956053] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)[2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536[2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory[2019-01-15 17:50:13.960212] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory[2019-01-15 17:50:13.964437] W [MSGID: 103071] [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device][2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init] 0-rdma.management: Failed to initialize IB Device[2019-01-15 17:50:13.964491] W [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed[2019-01-15 17:50:13.964560] W [rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed[2019-01-15 17:50:13.964579] E [MSGID: 106244] [glusterd.c:1764:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport[2019-01-15 17:50:14.967681] I [MSGID: 106513] [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 40100[2019-01-15 17:50:14.973931] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: d6bf51a7-c296-492f-8dac-e81efa9dd22d[2019-01-15 17:50:15.046620] E [MSGID: 101032] [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such file or directory][2019-01-15 17:50:15.046685] E [MSGID: 106201] [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management: Unable to restore volume: gfs-tst[2019-01-15 17:50:15.046718] E [MSGID: 101019] [xlator.c:720:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again[2019-01-15 17:50:15.046732] E [MSGID: 101066] [graph.c:367:glusterfs_graph_init] 0-management: initializing translator failed[2019-01-15 17:50:15.046741] E [MSGID: 101176] [graph.c:738:glusterfs_graph_activate] 0-graph: init failed[2019-01-15 17:50:15.047171] W [glusterfsd.c:1514:cleanup_and_exit] (-->/usr/local/sbin/glusterd(glusterfs_volumesIn long, I am trying to simulate a situation. where volume stoped abnormally andentire cluster restarted with some missing disks.My test cluster is set up with 3 nodes and each has four disks, I have setup a volume with disperse 4+2.In Node-3 2 disks have failed, to replace I have shutdown all systembelow are the steps done.1. umount from client machine2. shutdown all system by running `shutdown -h now` command ( without stopping volume and stop service)3. replace faulty disk in Node-34. powered ON all system5. format replaced drives, and mount all drives6. start glusterd service in all node (success)7. Now running `voulume status` command from node-3output : [2019-01-15 16:52:17.718422] : v status : FAILED : Staging failed on 0083ec0c-40bf-472a-a128-458924e56c96. Please check log file for details.8. running `voulume start gfs-tst` command from node-3output : [2019-01-15 16:53:19.410252] : v start gfs-tst : FAILED : Volume gfs-tst already started9. running `gluster v status` in other node. showing all brick available but 'self-heal daemon' not running@gfstst-node2:~$ sudo gluster v statusStatus of volume: gfs-tstGluster process TCP Port RDMA Port Online Pid------------------------------------------------------------------------------Brick IP.2:/media/disk1/brick1 49152 0 Y 1517Brick IP.4:/media/disk1/brick1 49152 0 Y 1668Brick IP.2:/media/disk2/brick2 49153 0 Y 1522Brick IP.4:/media/disk2/brick2 49153 0 Y 1678Brick IP.2:/media/disk3/brick3 49154 0 Y 1527Brick IP.4:/media/disk3/brick3 49154 0 Y 1677Brick IP.2:/media/disk4/brick4 49155 0 Y 1541Brick IP.4:/media/disk4/brick4 49155 0 Y 1683Self-heal Daemon on localhost N/A N/A Y 2662Self-heal Daemon on IP.4 N/A N/A Y 278610. in the above output 'volume already started'. so, running `reset-brick` commandv reset-brick gfs-tst IP.3:/media/disk3/brick3 IP.3:/media/disk3/brick3 commit forceoutput : [2019-01-15 16:57:37.916942] : v reset-brick gfs-tst IP.3:/media/disk3/brick3 IP.3:/media/disk3/brick3 commit force : FAILED : /media/disk3/brick3 is already part of a volume11. reset-brick command was not working, so, tried stopping volume and start with force commandoutput : [2019-01-15 17:01:04.570794] : v start gfs-tst force : FAILED : Pre-validation failed on localhost. Please check log file for details12. now stopped service in all node and tried starting again. except node-3 other nodes service started successfully without any issues.in node-3 receiving following message.sudo service glusterd start* Starting glusterd service glusterd [fail]/usr/local/sbin/glusterd: option requires an argument -- 'f'Try `glusterd --help' or `glusterd --usage' for more information.13. checking glusterd log file found that OS drive was running out of spaceoutput : [2019-01-15 16:51:37.210792] W [MSGID: 101012] [store.c:372:gf_store_save_value] 0-management: fflush failed. [No space left on device][2019-01-15 16:51:37.210874] E [MSGID: 106190] [glusterd-store.c:1058:glusterd_volume_exclude_options_write] 0-management: Unable to write volume values for gfs-tst14. cleared some space in OS drive but still, service is not running. below is the error logged in glusterd.log[2019-01-15 17:50:13.956053] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)[2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536[2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory[2019-01-15 17:50:13.960212] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory[2019-01-15 17:50:13.964437] W [MSGID: 103071] [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device][2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init] 0-rdma.management: Failed to initialize IB Device[2019-01-15 17:50:13.964491] W [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed[2019-01-15 17:50:13.964560] W [rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed[2019-01-15 17:50:13.964579] E [MSGID: 106244] [glusterd.c:1764:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport[2019-01-15 17:50:14.967681] I [MSGID: 106513] [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 40100[2019-01-15 17:50:14.973931] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: d6bf51a7-c296-492f-8dac-e81efa9dd22d[2019-01-15 17:50:15.046620] E [MSGID: 101032] [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such file or directory][2019-01-15 17:50:15.046685] E [MSGID: 106201] [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management: Unable to restore volume: gfs-tst[2019-01-15 17:50:15.046718] E [MSGID: 101019] [xlator.c:720:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again[2019-01-15 17:50:15.046732] E [MSGID: 101066] [graph.c:367:glusterfs_graph_init] 0-management: initializing translator failed[2019-01-15 17:50:15.046741] E [MSGID: 101176] [graph.c:738:glusterfs_graph_activate] 0-graph: init failed[2019-01-15 17:50:15.047171] W [glusterfsd.c:1514:cleanup_and_exit] (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52] -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41] -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-: received signum (-1), shutting down15. In other node running `volume status' still shows bricks node3 is livebut 'peer status' showing node-3 disconnected@gfstst-node2:~$ sudo gluster v statusStatus of volume: gfs-tstGluster process TCP Port RDMA Port Online Pid------------------------------------------------------------------------------Brick IP.2:/media/disk1/brick1 49152 0 Y 1517Brick IP.4:/media/disk1/brick1 49152 0 Y 1668Brick IP.2:/media/disk2/brick2 49153 0 Y 1522Brick IP.4:/media/disk2/brick2 49153 0 Y 1678Brick IP.2:/media/disk3/brick3 49154 0 Y 1527Brick IP.4:/media/disk3/brick3 49154 0 Y 1677Brick IP.2:/media/disk4/brick4 49155 0 Y 1541Brick IP.4:/media/disk4/brick4 49155 0 Y 1683Self-heal Daemon on localhost N/A N/A Y 2662Self-heal Daemon on IP.4 N/A N/A Y 2786Task Status of Volume gfs-tst------------------------------------------------------------------------------There are no active volume tasksroot@gfstst-node2:~$ sudo gluster pool listUUID Hostname Stated6bf51a7-c296-492f-8dac-e81efa9dd22d IP.3 Disconnectedc1cbb58e-3ceb-4637-9ba3-3d28ef20b143 IP.4 Connected0083ec0c-40bf-472a-a128-458924e56c96 localhost Connectedroot@gfstst-node2:~$ sudo gluster peer statusNumber of Peers: 2Hostname: IP.3Uuid: d6bf51a7-c296-492f-8dac-e81efa9dd22dState: Peer in Cluster (Disconnected)Hostname: IP.4Uuid: c1cbb58e-3ceb-4637-9ba3-3d28ef20b143State: Peer in Cluster (Connected)regardsAmudhan
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users