NovA, Another valid point, which was in our todo from long time. Will be done soon. Regards, Amar 2008/6/18 NovA <av.nova@xxxxxxxxx>: > Hi! > > 2008/6/18 Amar S. Tumballi <amar@xxxxxxxxxxxxx>: > > The fix for this issue is in the source repo now. You can try out > beating > > it hard again. Let me know about the problems if you have any. > Oh, what a fast response! :) Thanks a lot! > Unify is working now as expected. But not without problems, > unfortunately... > > When I disconnect one of the node and do "ls /home" at once, the > command hangs and can't be killed even with SIGKILL. The client log > contains: > ---- > 2008-06-18 17:05:15 W [client-protocol.c:205:call_bail] c54: > activating bail-out. pending frames = 2. last sent = 2008-06-18 > 17:04:29. last received = 2008-06-18 17:03:35 transport-timeout = 42 > 2008-06-18 17:05:15 C [client-protocol.c:212:call_bail] c54: bailing > transport > 2008-06-18 17:05:15 W [client-protocol.c:4777:client_protocol_cleanup] > c54: cleaning up state in transport object 0x63d790 > 2008-06-18 17:05:15 E [client-protocol.c:4827:client_protocol_cleanup] > c54: forced unwinding frame type(1) op(34) reply=@0x657320 > 2008-06-18 17:05:15 E [client-protocol.c:4423:client_lookup_cbk] c54: > no proper reply from server, returning ENOTCONN > 2008-06-18 17:05:15 E [unify.c:182:unify_lookup_cbk] bricks: c54 returned > 107 > 2008-06-18 17:05:15 E [unify.c:265:unify_lookup_cbk] bricks: > Revalidate failed for / > 2008-06-18 17:05:15 E [fuse-bridge.c:468:fuse_entry_cbk] > glusterfs-fuse: 15: (34) / => -1 (107) > 2008-06-18 17:05:15 E [client-protocol.c:325:client_protocol_xfer] > c54: transport_submit failed > 2008-06-18 17:05:15 E [client-protocol.c:4827:client_protocol_cleanup] > c54: forced unwinding frame type(1) op(34) reply=@0x657320 > 2008-06-18 17:05:15 E [client-protocol.c:4423:client_lookup_cbk] c54: > no proper reply from server, returning ENOTCONN > 2008-06-18 17:05:15 E [unify.c:182:unify_lookup_cbk] bricks: c54 returned > 107 > 2008-06-18 17:05:15 E [unify.c:265:unify_lookup_cbk] bricks: > Revalidate failed for / > 2008-06-18 17:05:15 E [fuse-bridge.c:468:fuse_entry_cbk] > glusterfs-fuse: 16: (34) / => -1 (107) > 2008-06-18 17:05:15 E [client-protocol.c:325:client_protocol_xfer] > c54: transport_submit failed > 2008-06-18 17:05:19 E [tcp-client.c:190:tcp_connect] c54: non-blocking > connect() returned: 113 (No route to host) > 2008-06-18 17:05:27 E [tcp-client.c:190:tcp_connect] c54: non-blocking > connect() returned: 113 (No route to host) > 2008-06-18 17:05:48 E [tcp-client.c:190:tcp_connect] c54: non-blocking > connect() returned: 113 (No route to host) > 2008-06-18 17:06:43 E [tcp-client.c:190:tcp_connect] c54: non-blocking > connect() returned: 113 (No route to host) > 2008-06-18 17:07:55 E [tcp-client.c:190:tcp_connect] c54: non-blocking > connect() returned: 113 (No route to host) > 2008-06-18 17:07:55 W [client-protocol.c:332:client_protocol_xfer] > c54: not connected at the moment to submit frame type(1) op(34) > 2008-06-18 17:07:55 E [client-protocol.c:4423:client_lookup_cbk] c54: > no proper reply from server, returning ENOTCONN > 2008-06-18 17:07:55 E [unify.c:182:unify_lookup_cbk] bricks: c54 returned > 107 > 2008-06-18 17:07:55 E [unify.c:265:unify_lookup_cbk] bricks: > Revalidate failed for / > 2008-06-18 17:07:55 E [fuse-bridge.c:468:fuse_entry_cbk] > glusterfs-fuse: 19: (34) / => -1 (107) > 2008-06-18 17:07:55 W [client-protocol.c:332:client_protocol_xfer] > c54: not connected at the moment to submit frame type(1) op(34) > 2008-06-18 17:07:55 E [client-protocol.c:4423:client_lookup_cbk] c54: > no proper reply from server, returning ENOTCONN > 2008-06-18 17:07:55 E [client-protocol.c:4572:client_checksum] c54: /: > returning EINVAL > ------ > > But after some time (seemingly concerned with transport timeout) any > further commands "ls /home" succeed. But the log is flooded by > messages like: > ----- > 2008-06-18 17:08:10 W [client-protocol.c:332:client_protocol_xfer] > c54: not connected at the moment to submit frame type(1) op(34) > 2008-06-18 17:08:10 E [client-protocol.c:4423:client_lookup_cbk] c54: > no proper reply from server, returning ENOTCONN > 2008-06-18 17:08:10 E [client-protocol.c:4572:client_checksum] c54: > /danilov/public_html: returning EINVAL > 2008-06-18 17:08:10 W [client-protocol.c:332:client_protocol_xfer] > c54: not connected at the moment to submit frame type(1) op(34) > 2008-06-18 17:08:10 E [client-protocol.c:4423:client_lookup_cbk] c54: > no proper reply from server, returning ENOTCONN > 2008-06-18 17:08:10 E [client-protocol.c:4572:client_checksum] c54: > /danilov/.mc: returning EINVAL > ---- > This is not so important as the first mentioned issue. But if, for > example, I turn the node off for a couple of days, then the log will > grow enormously... > > WBR, > Andrey > > > > > > On Tue, Jun 17, 2008 at 6:49 AM, Amar S. Tumballi <amar@xxxxxxxxxxxxx> > > wrote: > >> > >> I just noticed this behavior, which ideally should not be the case, you > >> will have a fix to it tomorrow. > >> > >> > >> On Tue, Jun 17, 2008 at 6:21 AM, Amar S. Tumballi <amar@xxxxxxxxxxxxx> > >> wrote: > >>> > >>> Currently if the server which got disconnected is having Namespace > export > >>> too.. then the lookups return ENOENT (file not found). Otherwise what > you > >>> described (whole filesystem will be online without few files). > >>> > >>> > >>> On Tue, Jun 17, 2008 at 4:57 AM, NovA <av.nova@xxxxxxxxx> wrote: > >>>> > >>>> I'm continuing to stress-test glusterFS 1.3.8+ series. Just upgraded > >>>> to tla781. It seems stable in my setup by now, no lockups yet. ;) > >>>> Great! > >>>> But I still can't reveal the desired feature concerning the subj. So I > >>>> have a concrete question. :) What is the supposed behaviour of the > >>>> unify translator (without AFR), when one of the servers disconnected? > >>>> I assumed, that in this case the glusterFS volume should remain online > >>>> with some files being inaccessible (which are on the disconnected > >>>> server). But now, if I plug the network cable out of a cluster node, > >>>> then "ls <unify_volume>" says that it cannot open directory, > >>>> "Transport endpoint is not connected". Am I just believe what I > >>>> desire? Is it supposed that the unify volume goes back online only > >>>> after the disconnected server return? > > -- Amar Tumballi Gluster/GlusterFS Hacker [bulde on #gluster/irc.gnu.org] http://www.zresearch.com - Commoditizing Super Storage!