Atin,
Could it be because bricks are started with
PROC_START_NO_WAIT?
Pranith
On 03/31/2015 04:41 AM, Rumen Telbizov
wrote:
Hello
everyone,
I
have a problem that I am trying to resolve and not sure which
way to go so here I am asking for your advise.
What it comes down to is that upon initial boot of all my
GlusterFS machines the shared volume doesn't get mounted.
Nevertheless the volume successfully created and started and
further attempts to mount it manually succeed. I suspect
what's happening is that gluster processes/bricks/etc haven't
fully started at the time the /etc/fstab entry is read and the
initial mount attempt is being made. Again, by the time I log
in and run a mount -a -- the volume mounts without any issues.
Details from the logs:
[2015-03-30
22:29:04.381918] I [MSGID: 100030]
[glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started
running /usr/sbin/glusterfs version 3.6.2 (args:
/usr/sbin/glusterfs
--log-file=/var/log/glusterfs/glusterfs.log
--attribute-timeout=0 --entry-timeout=0
--volfile-server=localhost --volfile-server=10.12.130.21
--volfile-server=10.12.130.22
--volfile-server=10.12.130.23 --volfile-id=/myvolume
/opt/shared)
[2015-03-30 22:29:04.394913] E
[socket.c:2267:socket_connect_finish] 0-glusterfs:
connection to 127.0.0.1:24007 failed
(Connection refused)
[2015-03-30 22:29:04.394950] E
[glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
0-glusterfsd-mgmt: failed to connect with remote-host:
localhost (Transport endpoint is not connected)
[2015-03-30 22:29:04.394964] I
[glusterfsd-mgmt.c:1838:mgmt_rpc_notify]
0-glusterfsd-mgmt: connecting to next volfile server
10.12.130.21
[2015-03-30 22:29:08.390687] E
[glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
0-glusterfsd-mgmt: failed to connect with remote-host:
10.12.130.21 (Transport endpoint is not connected)
[2015-03-30 22:29:08.390720] I
[glusterfsd-mgmt.c:1838:mgmt_rpc_notify]
0-glusterfsd-mgmt: connecting to next volfile server
10.12.130.22
[2015-03-30 22:29:11.392015] E
[glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
0-glusterfsd-mgmt: failed to connect with remote-host:
10.12.130.22 (Transport endpoint is not connected)
[2015-03-30 22:29:11.392050] I
[glusterfsd-mgmt.c:1838:mgmt_rpc_notify]
0-glusterfsd-mgmt: connecting to next volfile server
10.12.130.23
[2015-03-30 22:29:14.406429] I
[dht-shared.c:337:dht_init_regex] 0-brain-dht: using regex
rsync-hash-regex = ^\.(.+)\.[^.]+$
[2015-03-30 22:29:14.408964] I
[rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-2:
setting frame-timeout to 60
[2015-03-30 22:29:14.409183] I
[rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-1:
setting frame-timeout to 60
[2015-03-30 22:29:14.409388] I
[rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-0:
setting frame-timeout to 60
[2015-03-30 22:29:14.409430] I [client.c:2280:notify]
0-host-client-0: parent translators are ready, attempting
connect on transport
[2015-03-30 22:29:14.409658] I [client.c:2280:notify]
0-host-client-1: parent translators are ready, attempting
connect on transport
[2015-03-30 22:29:14.409844] I [client.c:2280:notify]
0-host-client-2: parent translators are ready, attempting
connect on transport
Final graph:
....
[2015-03-30 22:29:14.411045] I
[client.c:2215:client_rpc_notify] 0-host-client-2:
disconnected from host-client-2. Client process will keep
trying to connect to glusterd until brick's port is
available
[2015-03-30 22:29:14.411063] E
[MSGID: 108006] [afr-common.c:3591:afr_notify]
0-myvolume-replicate-0: All subvolumes are down. Going
offline until atleast one of them comes back up.
[2015-03-30 22:29:14.414871] I
[fuse-bridge.c:5080:fuse_graph_setup] 0-fuse: switched to
graph 0
[2015-03-30 22:29:14.415003] I
[fuse-bridge.c:4009:fuse_init] 0-glusterfs-fuse: FUSE
inited with protocol versions: glusterfs 7.22 kernel 7.17
[2015-03-30 22:29:14.415101] I
[afr-common.c:3722:afr_local_init] 0-myvolume-replicate-0:
no subvolumes up
[2015-03-30 22:29:14.415215] I
[afr-common.c:3722:afr_local_init] 0-myvolume-replicate-0:
no subvolumes up
[2015-03-30 22:29:14.415236] W
[fuse-bridge.c:779:fuse_attr_cbk] 0-glusterfs-fuse: 2:
LOOKUP() / => -1 (Transport endpoint is not connected)
[2015-03-30 22:29:14.419007] I
[fuse-bridge.c:4921:fuse_thread_proc] 0-fuse: unmounting
/opt/shared
[2015-03-30 22:29:14.420176] W
[glusterfsd.c:1194:cleanup_and_exit] (--> 0-:
received signum (15), shutting down
[2015-03-30 22:29:14.420192] I
[fuse-bridge.c:5599:fini] 0-fuse: Unmounting
'/opt/shared'.
Relevant
/etc/fstab entries are:
/dev/xvdb
/opt/local xfs defaults,noatime,nodiratime 0 0
localhost:/myvolume /opt/shared glusterfs
defaults,_netdev,attribute-timeout=0,entry-timeout=0,log-file=/var/log/glusterfs/glusterfs.log,backup-volfile-servers=10.12.130.21:10.12.130.22:10.12.130.23
0 0
Volume
configuration is:
Volume Name:
myvolume
Type: Replicate
Volume ID: xxxx
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: host1:/opt/local/brick
Brick2: host2:/opt/local/brick
Brick3: host3:/opt/local/brick
Options Reconfigured:
storage.health-check-interval: 5
network.ping-timeout: 5
nfs.disable: on
auth.allow: 10.12.130.21,10.12.130.22,10.12.130.23
cluster.quorum-type: auto
network.frame-timeout: 60
I run Debian 7 and the following GlusterFS version 3.6.2-2.
While
I could together some rc.local type of script which retries to
mount the volume for a while until it succeeds or times out I
was wondering if there's a better way to solve this problem?
Thank
you for your help.
Regards,
--
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
|
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users