Re: Why is it not possible to mount a replicated gluster volume with one Gluster server?

Merlin Morgenstern <merlin.morgenstern@xxxxxxxxx> · Mon, 31 Aug 2015 19:04:46 +0200

Thank you all for your help.
To explain the setup better, here is the goal I am trying to achieve:

- 3 servers running in a cluster, each with a webserver uploading and serving files to visitors from a common glusterfs share.
- Server1 and Server2 have gluster-server installed
- One brick replicated between Server1 and Server2 with the goal of achieving High Availability
- Server1, Server2 and Server3 mount the brick through fuse. 
- Server1 mounts Gluster-Server1 with Backup of Server 2. Same via versa for Server2

Now following scenario:

1. Server2 dies

In this case Server1 serves as a failover and serves the files for Server1,2,3 until Server1 comes back up again. This works.

2. Server2 dies. Server1 has to reboot.

In this case the service stays down. It is inpossible to remount the share without Server1. This is not acceptable for a High Availability System and I believe also not intended, but a misconfiguration or bug.

Thank you again for looking into this.

2015-08-31 14:10 GMT+02:00 Yiping Peng <barius.cn@xxxxxxxxx>:
One more thing, when I do this on server1, which has been in the pool for a long time:
server1:~$ mount server1:/vol1 mountpoint
It also fails.
The log gave me:

My fault, I used localhost as endpoint.

I re-issued "mount -t glusterfs server01:/speech0 qqq"
and the log shows a lot of things like:

[2015-08-31 12:08:44.801169] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 57, Protocol not available
[2015-08-31 12:08:44.801187] E [socket.c:3019:socket_connect] 0-speech0-client-43: Failed to set keep-alive: Protocol not available
[2015-08-31 12:08:44.801305] W [socket.c:642:__socket_rwv] 0-speech0-client-43: readv on 10.88.153.25:24007 failed (Connection reset by peer)
[2015-08-31 12:08:44.801404] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fcf540db65b] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fcf53ea71b7] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fcf53ea72ce] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fcf53ea739b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fcf53ea795f] ))))) 0-speech0-client-43: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2015-08-31 12:08:44.801294 (xid=0x17)
[2015-08-31 12:08:44.801423] W [MSGID: 114032] [client-handshake.c:1623:client_dump_version_cbk] 0-speech0-client-43: received RPC status error [Transport endpoint is not connected]
[2015-08-31 12:08:44.801440] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-speech0-client-43: disconnected from speech0-client-43. Client process will keep trying to connect to glusterd until brick's port is available
[2015-08-31 12:08:44.804488] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 57, Protocol not available
[2015-08-31 12:08:44.804505] E [socket.c:3019:socket_connect] 0-speech0-client-51: Failed to set keep-alive: Protocol not available
[2015-08-31 12:08:44.804775] W [socket.c:642:__socket_rwv] 0-speech0-client-51: readv on 10.88.146.19:24007 failed (Connection reset by peer)
[2015-08-31 12:08:44.804878] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fcf540db65b] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fcf53ea71b7] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fcf53ea72ce] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fcf53ea739b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fcf53ea795f] ))))) 0-speech0-client-51: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2015-08-31 12:08:44.804693 (xid=0x18)
[2015-08-31 12:08:44.804898] W [MSGID: 114032] [client-handshake.c:1623:client_dump_version_cbk] 0-speech0-client-51: received RPC status error [Transport endpoint is not connected]
[2015-08-31 12:08:44.804917] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-speech0-client-51: disconnected from speech0-client-51. Client process will keep trying to connect to glusterd until brick's port is available

2015-08-31 20:06 GMT+08:00 Yiping Peng <barius.cn@xxxxxxxxx>:

I believe the following events have happened in the cluster resulting
into this situation:
1. GlusterD & brick process on node 2 was brought down
2. Node 1 was rebooted.
Strangely enough, glusterfs, glusterd and glusterfsd are running on my server. Is glusterfsd the brick process? Also server01 has not been rebooted during the whole process.

glusterfsd has the following arguments:
/usr/sbin/glusterfsd -s server01.local.net --volfile-id speech0.server01.local.net.home-glusterfs-speech0-brick0 -p /var/lib/glusterd/vols/speech0/run/server01.local.net-home-glusterfs-speech0-brick0.pid -S /var/run/gluster/6bf40a98deade9dde8b615226bc57567.socket --brick-name /home/glusterfs/speech0/brick0 -l /var/log/glusterfs/bricks/home-glusterfs-speech0-brick0.log --xlator-option *-posix.glusterd-uuid=1c33ff18-2a6a-44cf-9a04-727fc96e92be --brick-port 49159 --xlator-option speech0-server.listen-port=49159

One more thing, when I do this on server1, which has been in the pool for a long time:
server1:~$ mount server1:/vol1 mountpoint
It also fails.
The log gave me:

[2015-08-31 11:56:57.123307] I [MSGID: 100030] [glusterfsd.c:2301:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3 (args: /usr/sbin/glusterfs --volfile-server=localhost --volfile-id=/speech0 qqq)
[2015-08-31 11:56:57.134642] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 9, Protocol not available
[2015-08-31 11:56:57.134688] E [socket.c:3019:socket_connect] 0-glusterfs: Failed to set keep-alive: Protocol not available
[2015-08-31 11:56:57.135063] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2015-08-31 11:56:57.135113] E [socket.c:2332:socket_connect_finish] 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection reset by peer)
[2015-08-31 11:56:57.135149] E [glusterfsd-mgmt.c:1819:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: localhost (Transport endpoint is not connected)
[2015-08-31 11:56:57.135158] I [glusterfsd-mgmt.c:1825:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2015-08-31 11:56:57.135333] W [glusterfsd.c:1219:cleanup_and_exit] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3) [0x7fb5e1be39a3] -->/usr/sbin/glusterfs() [0x4099c8] -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] ) 0-: received signum (1), shutting down
[2015-08-31 11:56:57.135371] I [fuse-bridge.c:5595:fini] 0-fuse: Unmounting '/home/speech/pengyiping/qqq'.
[2015-08-31 11:56:57.140640] W [glusterfsd.c:1219:cleanup_and_exit] (-->/lib64/libpthread.so.0() [0x318b207851] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d] -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] ) 0-: received signum (15), shutting down

Any help is much appreciated.

2015-08-31 19:15 GMT+08:00 Atin Mukherjee <amukherj@xxxxxxxxxx>:
I believe the following events have happened in the cluster resulting

into this situation:

1. GlusterD & brick process on node 2 was brought down

2. Node 1 was rebooted.

In the above case the mount will definitely fail since the brick process

was not started as in a 2 node set up glusterd waits its peers to come

up before it starts the bricks. Could you check whether the brick

process is running or not?

Thanks,

Atin

On 08/31/2015 04:17 PM, Yiping Peng wrote:

> I've tried both: assuming server1 is already in pool, server2 is undergoing

> peer-probing

>

> server2:~$ mount server1:/vol1 mountpoint, fail;

> server2:~$ mount server2:/vol1 mountpoint, fail.

>

> Strange enough. I *should* be able to mount server1:/vol1 on server2. But

> this is not the case :(

> Maybe something is broken in the server pool, as I'm seeing disconnected

> nodes?

>

>

> 2015-08-31 18:02 GMT+08:00 Ravishankar N <ravishankar@xxxxxxxxxx>:

>

>>

>>

>> On 08/31/2015 12:53 PM, Merlin Morgenstern wrote:

>>

>> Trying to mount the brick on the same physical server with deamon running

>> on this server but not on the other server:

>>

>> @node2:~$ sudo mount -t glusterfs gs2:/volume1 /data/nfs

>> Mount failed. Please check the log file for more details.

>>

>> For mount to succeed the glusterd must be up on the node that you specify

>> as the volfile-server; gs2 in this case. You can use -o

>> backupvolfile-server=gs1 as a fallback.

>> -Ravi

>>

>> _______________________________________________

>> Gluster-users mailing list

>> Gluster-users@xxxxxxxxxxx

>> http://www.gluster.org/mailman/listinfo/gluster-users

>>

>

>

>

> _______________________________________________

> Gluster-users mailing list

> Gluster-users@xxxxxxxxxxx

> http://www.gluster.org/mailman/listinfo/gluster-users

>

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users