Krishna Srinivas wrote:
On Thu, May 8, 2008 at 9:19 PM, Gerry Reno <greno@xxxxxxxxxxx> wrote:
Krishna Srinivas wrote:
Gerry,
In your client spec "client-local" does not have any purpose right?
This is your setup:
server1 and server2 have /home/vmail/mailbrick as storage exports.
on client you have an AFR which connects to server1 and server2.
client mounts it on /home/vmail/mailstore
Can you try mounting on command line instead of fstab?
When you kill one of the servers, can you see if you see anything
in the log files?
Also mention "option transport-timeout 5" in the two "client/protocol"
subvolumes. (so the timeout will be 5 secs)
Thanks
Krishna
Two machines.
Each machine has a server storage brick (/home/vmail/mailbrick)
Each machine also has a client (/home/vmail/mailstore)
If one of the machines either crashes or needs to be rebooted then it hangs
the client mount on the other machine.
I'll umount the mount from fstab and remount from command line and let you
know.
Also mention "option transport-timeout 5" in the two "client/protocol"
subvolumes. (so the timeout will be 5 secs)
Regards,
Gerry
Ok, I ran some tests:
First, when I started I noticed that on one machine when I did a 'df'
that I would see two client mounts and on the other machine I would see
one client mount. I unmounted the clients from fstab and then changed
the client.vol to include the option transport-timeout 5. Then I
started the clients from the command line. I see one client mount on
each machine. I kill one machine. The other machine still functions.
Did this a couple times. Then I went and left the timeout in the vol
and just rebooted both machines. They both came back up and df shows
two client mounts on both machines. ps shows two client processes on
both machines. I kill one machine again and the other machine still
functions. So I was not able to recreate hang.
I check logs and I can see in the log that there are thousands of lines
like the following over the past weeks in both logs:
2008-04-26 00:27:55 E [client-protocol.c:4405:client_lookup_cbk]
client2: no proper reply from server, returning ENOTCONN
2008-04-26 00:27:55 E [tcp-client.c:190:tcp_connect] client2:
non-blocking connect() returned: 111 (Connection refused)
2008-04-26 00:27:55 W [client-protocol.c:331:client_protocol_xfer]
client2: not connected at the moment to submit frame type(1) op(22)
2008-04-26 00:27:55 E [client-protocol.c:3742:client_opendir_cbk]
client2: no proper reply from server, returning ENOTCONN
2008-04-26 00:27:55 E [afr_self_heal.c:290:afr_lds_opendir_cbk] afr:
op_ret=-1 op_errno=107
2008-04-26 00:27:55 E [afr_self_heal.c:290:afr_lds_opendir_cbk] afr:
op_ret=-1 op_errno=24
2008-04-26 00:27:55 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
11084: (34) /example.com/john => -1 (5)
2008-04-26 00:27:55 E [tcp-client.c:190:tcp_connect] client2:
non-blocking connect() returned: 111 (Connection refused)
2008-04-26 00:27:55 W [client-protocol.c:331:client_protocol_xfer]
client2: not connected at the moment to submit frame type(1) op(34)
2008-04-26 00:27:55 E [client-protocol.c:4405:client_lookup_cbk]
client2: no proper reply from server, returning ENOTCONN
2008-04-26 00:27:55 E [tcp-client.c:190:tcp_connect] client2:
non-blocking connect() returned: 111 (Connection refused)
2008-04-26 00:27:55 W [client-protocol.c:331:client_protocol_xfer]
client2: not connected at the moment to submit frame type(1) op(34)
2008-04-26 00:27:55 E [client-protocol.c:4405:client_lookup_cbk]
client2: no proper reply from server, returning ENOTCONN
2008-04-26 00:27:55 E [tcp-client.c:190:tcp_connect] client2:
non-blocking connect() returned: 111 (Connection refused)
2008-04-26 00:27:55 W [client-protocol.c:331:client_protocol_xfer]
client2: not connected at the moment to submit frame type(1) op(34)
2008-04-25 19:47:47 E [afr.c:2018:afr_open_cbk] afr:
(path=/example.com/john/dovecot-uidlist.lock child=client2) op_ret=-1
op_errno=2
2008-04-25 19:47:47 E [afr.c:2018:afr_open_cbk] afr:
(path=/example.com/john/dovecot-uidlist.lock child=client1) op_ret=-1
op_errno=2
2008-04-25 19:47:47 E [fuse-bridge.c:692:fuse_fd_cbk] glusterfs-fuse:
5775: (12) /example.com/john/dovecot-uidlist.lock => -1 (2)
2008-04-25 13:09:02 W [fuse-bridge.c:402:fuse_entry_cbk] glusterfs-fuse:
3883: (34) /example.com/gerryreno/dovecot-keywords => 566935 Rehashing
because st_nlink less than dentry maps
2008-04-25 13:09:02 E [fuse-bridge.c:1140:fuse_unlink] glusterfs-fuse:
3894: UNLINK /example.com/gerryreno/dovecot-uidlist (fuse_loc_fill()
returned NULL inode)
Anyway, I wasn't able to see the hang using the transport-timeout. I'm
trying to think about why there are two client mounts from fstab
though. That seems strange.
Regards,
Gerry