Re: Initial mount problem - all subvolumes are down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 03/31/2015 12:53 PM, Atin Mukherjee wrote:

On 03/31/2015 12:27 PM, Pranith Kumar Karampuri wrote:
Atin,
        Could it be because bricks are started with PROC_START_NO_WAIT?
That's the correct analysis Pranith. Mount was attempted before the
bricks were started. If we can have a time lag in some seconds between
mount and volume start the problem will go away.
Atin,
I think one way to solve this issue is to start the bricks with NO_WAIT so that we can handle pmap-signin but wait for the pmap-signins to complete before responding to cli/completing 'init'?

Pranith


Pranith
On 03/31/2015 04:41 AM, Rumen Telbizov wrote:
Hello everyone,

I have a problem that I am trying to resolve and not sure which way to
go so here I am asking for your advise.

What it comes down to is that upon initial boot of all my GlusterFS
machines the shared volume doesn't get mounted. Nevertheless the
volume successfully created and started and further attempts to mount
it manually succeed. I suspect what's happening is that gluster
processes/bricks/etc haven't fully started at the time the /etc/fstab
entry is read and the initial mount attempt is being made. Again, by
the time I log in and run a mount -a -- the volume mounts without any
issues.

_Details from the logs:_

[2015-03-30 22:29:04.381918] I [MSGID: 100030]
[glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running
/usr/sbin/glusterfs version 3.6.2 (args: /usr/sbin/glusterfs
--log-file=/var/log/glusterfs/glusterfs.log --attribute-timeout=0
--entry-timeout=0 --volfile-server=localhost
--volfile-server=10.12.130.21 --volfile-server=10.12.130.22
--volfile-server=10.12.130.23 --volfile-id=/myvolume /opt/shared)
[2015-03-30 22:29:04.394913] E [socket.c:2267:socket_connect_finish]
0-glusterfs: connection to 127.0.0.1:24007 <http://127.0.0.1:24007>
failed (Connection refused)
[2015-03-30 22:29:04.394950] E
[glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to
connect with remote-host: localhost (Transport endpoint is not connected)
[2015-03-30 22:29:04.394964] I
[glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting
to next volfile server 10.12.130.21
[2015-03-30 22:29:08.390687] E
[glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to
connect with remote-host: 10.12.130.21 (Transport endpoint is not
connected)
[2015-03-30 22:29:08.390720] I
[glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting
to next volfile server 10.12.130.22
[2015-03-30 22:29:11.392015] E
[glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to
connect with remote-host: 10.12.130.22 (Transport endpoint is not
connected)
[2015-03-30 22:29:11.392050] I
[glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting
to next volfile server 10.12.130.23
[2015-03-30 22:29:14.406429] I [dht-shared.c:337:dht_init_regex]
0-brain-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
[2015-03-30 22:29:14.408964] I
[rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-2: setting
frame-timeout to 60
[2015-03-30 22:29:14.409183] I
[rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-1: setting
frame-timeout to 60
[2015-03-30 22:29:14.409388] I
[rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-0: setting
frame-timeout to 60
[2015-03-30 22:29:14.409430] I [client.c:2280:notify] 0-host-client-0:
parent translators are ready, attempting connect on transport
[2015-03-30 22:29:14.409658] I [client.c:2280:notify] 0-host-client-1:
parent translators are ready, attempting connect on transport
[2015-03-30 22:29:14.409844] I [client.c:2280:notify] 0-host-client-2:
parent translators are ready, attempting connect on transport
Final graph:

....

[2015-03-30 22:29:14.411045] I [client.c:2215:client_rpc_notify]
0-host-client-2: disconnected from host-client-2. Client process will
keep trying to connect to glusterd until brick's port is available
*[2015-03-30 22:29:14.411063] E [MSGID: 108006]
[afr-common.c:3591:afr_notify] 0-myvolume-replicate-0: All subvolumes
are down. Going offline until atleast one of them comes back up.
*[2015-03-30 22:29:14.414871] I [fuse-bridge.c:5080:fuse_graph_setup]
0-fuse: switched to graph 0
[2015-03-30 22:29:14.415003] I [fuse-bridge.c:4009:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22
kernel 7.17
[2015-03-30 22:29:14.415101] I [afr-common.c:3722:afr_local_init]
0-myvolume-replicate-0: no subvolumes up
[2015-03-30 22:29:14.415215] I [afr-common.c:3722:afr_local_init]
0-myvolume-replicate-0: no subvolumes up
[2015-03-30 22:29:14.415236] W [fuse-bridge.c:779:fuse_attr_cbk]
0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not
connected)
[2015-03-30 22:29:14.419007] I [fuse-bridge.c:4921:fuse_thread_proc]
0-fuse: unmounting /opt/shared
*[2015-03-30 22:29:14.420176] W [glusterfsd.c:1194:cleanup_and_exit]
(--> 0-: received signum (15), shutting down*
[2015-03-30 22:29:14.420192] I [fuse-bridge.c:5599:fini] 0-fuse:
Unmounting '/opt/shared'.


_Relevant /etc/fstab entries are:_

/dev/xvdb /opt/local xfs defaults,noatime,nodiratime 0 0

localhost:/myvolume /opt/shared glusterfs
defaults,_netdev,attribute-timeout=0,entry-timeout=0,log-file=/var/log/glusterfs/glusterfs.log,backup-volfile-servers=10.12.130.21:10.12.130.22:10.12.130.23
0 0


_Volume configuration is:_

Volume Name: myvolume
Type: Replicate
Volume ID: xxxx
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: host1:/opt/local/brick
Brick2: host2:/opt/local/brick
Brick3: host3:/opt/local/brick
Options Reconfigured:
storage.health-check-interval: 5
network.ping-timeout: 5
nfs.disable: on
auth.allow: 10.12.130.21,10.12.130.22,10.12.130.23
cluster.quorum-type: auto
network.frame-timeout: 60


I run Debian 7 and the following GlusterFS version 3.6.2-2.

While I could together some rc.local type of script which retries to
mount the volume for a while until it succeeds or times out I was
wondering if there's a better way to solve this problem?

Thank you for your help.

Regards,
--
Rumen Telbizov
Unix Systems Administrator <http://telbizov.com>


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux