Hi! 2008/6/18 Amar S. Tumballi <amar@xxxxxxxxxxxxx>: > The fix for this issue is in the source repo now. You can try out beating > it hard again. Let me know about the problems if you have any. Oh, what a fast response! :) Thanks a lot! Unify is working now as expected. But not without problems, unfortunately... When I disconnect one of the node and do "ls /home" at once, the command hangs and can't be killed even with SIGKILL. The client log contains: ---- 2008-06-18 17:05:15 W [client-protocol.c:205:call_bail] c54: activating bail-out. pending frames = 2. last sent = 2008-06-18 17:04:29. last received = 2008-06-18 17:03:35 transport-timeout = 42 2008-06-18 17:05:15 C [client-protocol.c:212:call_bail] c54: bailing transport 2008-06-18 17:05:15 W [client-protocol.c:4777:client_protocol_cleanup] c54: cleaning up state in transport object 0x63d790 2008-06-18 17:05:15 E [client-protocol.c:4827:client_protocol_cleanup] c54: forced unwinding frame type(1) op(34) reply=@0x657320 2008-06-18 17:05:15 E [client-protocol.c:4423:client_lookup_cbk] c54: no proper reply from server, returning ENOTCONN 2008-06-18 17:05:15 E [unify.c:182:unify_lookup_cbk] bricks: c54 returned 107 2008-06-18 17:05:15 E [unify.c:265:unify_lookup_cbk] bricks: Revalidate failed for / 2008-06-18 17:05:15 E [fuse-bridge.c:468:fuse_entry_cbk] glusterfs-fuse: 15: (34) / => -1 (107) 2008-06-18 17:05:15 E [client-protocol.c:325:client_protocol_xfer] c54: transport_submit failed 2008-06-18 17:05:15 E [client-protocol.c:4827:client_protocol_cleanup] c54: forced unwinding frame type(1) op(34) reply=@0x657320 2008-06-18 17:05:15 E [client-protocol.c:4423:client_lookup_cbk] c54: no proper reply from server, returning ENOTCONN 2008-06-18 17:05:15 E [unify.c:182:unify_lookup_cbk] bricks: c54 returned 107 2008-06-18 17:05:15 E [unify.c:265:unify_lookup_cbk] bricks: Revalidate failed for / 2008-06-18 17:05:15 E [fuse-bridge.c:468:fuse_entry_cbk] glusterfs-fuse: 16: (34) / => -1 (107) 2008-06-18 17:05:15 E [client-protocol.c:325:client_protocol_xfer] c54: transport_submit failed 2008-06-18 17:05:19 E [tcp-client.c:190:tcp_connect] c54: non-blocking connect() returned: 113 (No route to host) 2008-06-18 17:05:27 E [tcp-client.c:190:tcp_connect] c54: non-blocking connect() returned: 113 (No route to host) 2008-06-18 17:05:48 E [tcp-client.c:190:tcp_connect] c54: non-blocking connect() returned: 113 (No route to host) 2008-06-18 17:06:43 E [tcp-client.c:190:tcp_connect] c54: non-blocking connect() returned: 113 (No route to host) 2008-06-18 17:07:55 E [tcp-client.c:190:tcp_connect] c54: non-blocking connect() returned: 113 (No route to host) 2008-06-18 17:07:55 W [client-protocol.c:332:client_protocol_xfer] c54: not connected at the moment to submit frame type(1) op(34) 2008-06-18 17:07:55 E [client-protocol.c:4423:client_lookup_cbk] c54: no proper reply from server, returning ENOTCONN 2008-06-18 17:07:55 E [unify.c:182:unify_lookup_cbk] bricks: c54 returned 107 2008-06-18 17:07:55 E [unify.c:265:unify_lookup_cbk] bricks: Revalidate failed for / 2008-06-18 17:07:55 E [fuse-bridge.c:468:fuse_entry_cbk] glusterfs-fuse: 19: (34) / => -1 (107) 2008-06-18 17:07:55 W [client-protocol.c:332:client_protocol_xfer] c54: not connected at the moment to submit frame type(1) op(34) 2008-06-18 17:07:55 E [client-protocol.c:4423:client_lookup_cbk] c54: no proper reply from server, returning ENOTCONN 2008-06-18 17:07:55 E [client-protocol.c:4572:client_checksum] c54: /: returning EINVAL ------ But after some time (seemingly concerned with transport timeout) any further commands "ls /home" succeed. But the log is flooded by messages like: ----- 2008-06-18 17:08:10 W [client-protocol.c:332:client_protocol_xfer] c54: not connected at the moment to submit frame type(1) op(34) 2008-06-18 17:08:10 E [client-protocol.c:4423:client_lookup_cbk] c54: no proper reply from server, returning ENOTCONN 2008-06-18 17:08:10 E [client-protocol.c:4572:client_checksum] c54: /danilov/public_html: returning EINVAL 2008-06-18 17:08:10 W [client-protocol.c:332:client_protocol_xfer] c54: not connected at the moment to submit frame type(1) op(34) 2008-06-18 17:08:10 E [client-protocol.c:4423:client_lookup_cbk] c54: no proper reply from server, returning ENOTCONN 2008-06-18 17:08:10 E [client-protocol.c:4572:client_checksum] c54: /danilov/.mc: returning EINVAL ---- This is not so important as the first mentioned issue. But if, for example, I turn the node off for a couple of days, then the log will grow enormously... WBR, Andrey > > On Tue, Jun 17, 2008 at 6:49 AM, Amar S. Tumballi <amar@xxxxxxxxxxxxx> > wrote: >> >> I just noticed this behavior, which ideally should not be the case, you >> will have a fix to it tomorrow. >> >> >> On Tue, Jun 17, 2008 at 6:21 AM, Amar S. Tumballi <amar@xxxxxxxxxxxxx> >> wrote: >>> >>> Currently if the server which got disconnected is having Namespace export >>> too.. then the lookups return ENOENT (file not found). Otherwise what you >>> described (whole filesystem will be online without few files). >>> >>> >>> On Tue, Jun 17, 2008 at 4:57 AM, NovA <av.nova@xxxxxxxxx> wrote: >>>> >>>> I'm continuing to stress-test glusterFS 1.3.8+ series. Just upgraded >>>> to tla781. It seems stable in my setup by now, no lockups yet. ;) >>>> Great! >>>> But I still can't reveal the desired feature concerning the subj. So I >>>> have a concrete question. :) What is the supposed behaviour of the >>>> unify translator (without AFR), when one of the servers disconnected? >>>> I assumed, that in this case the glusterFS volume should remain online >>>> with some files being inaccessible (which are on the disconnected >>>> server). But now, if I plug the network cable out of a cluster node, >>>> then "ls <unify_volume>" says that it cannot open directory, >>>> "Transport endpoint is not connected". Am I just believe what I >>>> desire? Is it supposed that the unify volume goes back online only >>>> after the disconnected server return?