You're hitting a race here. By the time glusterd tries to resolve the address of one of the remote bricks of a particular volume, the n/w interface is not up by that time. We have fixed this issue in mainline and 3.12 branch through the following commit:
~AtinNote : 3.12 release is planned by end of this month.
commit 1477fa442a733d7b1a5ea74884cac8f29fbe7e6a
Author: Gaurav Yadav <gyadav@xxxxxxxxxx>
Date: Tue Jul 18 16:23:18 2017 +0530
glusterd : glusterd fails to start when peer's network interface is down
Problem:
glusterd fails to start on nodes where glusterd tries to come up even
before network is up.
Fix:
On startup glusterd tries to resolve brick path which is based on
hostname/ip, but in the above scenario when network interface is not
up, glusterd is not able to resolve the brick path using ip_address or
hostname With this fix glusterd will use UUID to resolve brick path.
Change-Id: Icfa7b2652417135530479d0aa4e2a82b0476f710
BUG: 1472267
Signed-off-by: Gaurav Yadav <gyadav@xxxxxxxxxx>
Reviewed-on: https://review.gluster.org/17813
Smoke: Gluster Build System <jenkins@xxxxxxxxxxxxxxxxx>
Reviewed-by: Prashanth Pai <ppai@xxxxxxxxxx>
CentOS-regression: Gluster Build System <jenkins@xxxxxxxxxxxxxxxxx>
Reviewed-by: Atin Mukherjee <amukherj@xxxxxxxxxx>On Thu, Aug 17, 2017 at 2:45 PM, ismael mondiu <mondiu@xxxxxxxxxxx> wrote:______________________________Hi Team,
I noticed that glusterd is never starting when i reboot my Redhat 7.1 server.
Service is enabled but don't works.
I tested with gluster 3.10.4 & gluster 3.10.5 and the problem still exists.
When i started the service manually this works.
I'va also tested on Redhat 6.6 server and gluster 3.10.4 and this works fine.
The problem seems to be related to Redhat 7.1
This is à known issue ? if yes , can you tell me what's is the workaround?
Thanks
Some logs here
[root@~]# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2017-08-17 11:04:00 CEST; 2min 9s ago
Process: 851 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=1/FAILURE)Aug 17 11:03:59 dvihcasc0r systemd[1]: Starting GlusterFS, a clustered file-system server...
Aug 17 11:04:00 dvihcasc0r systemd[1]: glusterd.service: control process exited, code=exited status=1
Aug 17 11:04:00 dvihcasc0r systemd[1]: Failed to start GlusterFS, a clustered file-system server.
Aug 17 11:04:00 dvihcasc0r systemd[1]: Unit glusterd.service entered failed state.
Aug 17 11:04:00 dvihcasc0r systemd[1]: glusterd.service failed.
******************************
****************************** **************************** /var/log/glusterfs/glusterd.l
og ******************************
****************************** ****************************** **
2017-08-17 09:04:00.202529] I [MSGID: 106478] [glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors set to 65536
[2017-08-17 09:04:00.202573] I [MSGID: 106479] [glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working directory
[2017-08-17 09:04:00.365134] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.10.5/rp c-transport/rdma.so: cannot open shared object file: No such file or directory
[2017-08-17 09:04:00.365161] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[2017-08-17 09:04:00.365195] W [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[2017-08-17 09:04:00.365206] E [MSGID: 106243] [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[2017-08-17 09:04:00.464314] I [MSGID: 106228] [glusterd.c:500:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory]
[2017-08-17 09:04:00.510412] I [MSGID: 106513] [glusterd-store.c:2197:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31004
[2017-08-17 09:04:00.711413] I [MSGID: 106194] [glusterd-store.c:3776:glusterd_store_retrieve_missed_snaps_ list] 0-management: No missed snaps list.
[2017-08-17 09:04:00.756731] E [MSGID: 106187] [glusterd-store.c:4559:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore
[2017-08-17 09:04:00.756787] E [MSGID: 101019] [xlator.c:503:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2017-08-17 09:04:00.756802] E [MSGID: 101066] [graph.c:325:glusterfs_graph_init] 0-management: initializing translator failed
[2017-08-17 09:04:00.756816] E [MSGID: 101176] [graph.c:681:glusterfs_graph_activate] 0-graph: init failed
[2017-08-17 09:04:00.766584] W [glusterfsd.c:1332:cleanup_and_exit] (-->/usr/sbin/glusterd(gluster fs_volumes_init+0xfd) [0x7f9bdef4cabd] -->/usr/sbin/glusterd(glusterf s_process_volfp+0x1b1) [0x7f9bdef4c961] -->/usr/sbin/glusterd(cleanup_ and_exit+0x6b) [0x7f9bdef4be4b] ) 0-: received signum (1), shutting down
******************************
****************************** ****************************** [root@~]# uptime
11:13:55 up 10 min, 1 user, load average: 0.00, 0.02, 0.04
******************************
****************************** ******************************
_________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users