Andrey, can you send your complete log file, especially the section which logs about the frames forcefully unwound after the bail.. avati On 08/06/2008, NovA <av.nova@xxxxxxxxx> wrote: > > Hello everybody! > > I've using GlusterFS for rather long time. It's a great project! Thanks! > My old GlusterFS 1.2tla184 is a rock stable, but with new 1.3.x series > I still have problems :(. Here there is a bunch of them for the > version 1.3.9tla772. > > I use unify over 24 bricks, each on different cluster node. Each node > runs glusterfs server (exporting local HDD) and client (mounting > glusterFS unify at /home) as different processes. Server xlators are: > storage/posix (and name-space on head node) -> features/posix-locks -> > tcp/server. Client consists of: tcp/clients -> cluster/unify NUFA > (except the head node with ALU) -> performance/write-behind. Each node > runs openSUSE 10.3 with kernel 2.6.22.17-0.1 x86_64, fuse-2.7.2glfs9. > > 1) The most annoying problem is a complete glusterFS lockup. It became > apparent in real-world usage by multiple users. At random moment any > attempt to access glusterFS on head node (name-space, ALU unify) fails > and the client.log is flooded by messages like > -------- > 2008-06-03 17:11:06 W [client-protocol.c:205:call_bail] c36: > activating bail-out. pending frames = 3. last sent = 2008-06-03 > 17:10:23. last received = 2008-06-03 17:10:23 transport-timeout = 42 > 2008-06-03 17:11:06 C [client-protocol.c:212:call_bail] c36: bailing > transport > 2008-06-03 17:11:06 W [client-protocol.c:205:call_bail] c45: > activating bail-out. pending frames = 4. last sent = 2008-06-03 > 17:10:23. last received = 2008-06-03 17:10:23 transport-timeout = 42 > 2008-06-03 17:11:06 C [client-protocol.c:212:call_bail] c45: bailing > transport > -------- > repeated infinitely each minute (with node names changing in loop). A > have two log files of 62Mb and 138Mb filled with such errors (they > were generated when I left the system unattended for a day). Moreover, > when the glusterfs enters in such a regime it can't be killed, even > with "killall -9 glusterfs". But on the other cluster nodes (with NUFA > unify) logs are free from these messages, and it is possible to access > unify FS without lockup. > > I can't identify the initial cause of the lockup. Once it happened > just after I switched off one of the bricks. But most of the times > there is no any unusual actions on FS, just file/dir creation and > coping/moving. Logs are too huge and full of other errors (see > bellow), to find the cause. BTW, what does this message mean? :) > > 2) The second problem is already mentioned in the mailing list - > sometimes files are double created on bricks. And the file became > inaccessible until I delete one copy. Can this be done automatically? > > 3) My logs are also full of the following error: > ----- > 2008-06-02 16:03:33 E [unify.c:325:unify_lookup] bricks: returning > ESTALE for / [translator generation (25) inode generation (23)] > 2008-06-02 16:03:33 E [fuse-bridge.c:459:fuse_entry_cbk] > glusterfs-fuse: 301: (34) / => -1 (116) > 2008-06-02 16:03:33 E [unify.c:325:unify_lookup] bricks: returning > ESTALE for / [translator generation (25) inode generation (23)] > 2008-06-02 16:03:33 E [fuse-bridge.c:459:fuse_entry_cbk] > glusterfs-fuse: 302: (34) / => -1 (116) > ----- > This error happens when the glusterFS mount point is touched somehow > (for example "ls /home"), but not the subdirs. Despite of the error > such an operation succeeds, but with a lag. > It seems, that this is somehow connected with the non-simultaneous > start of the cluster nodes (namely their glusterfs servers). When all > nodes are up, the remount of the glusterfs helps to get rid of the > mentioned error. > > Hope these problems can be resolved... > > With best regards, > Andrey > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel > -- If I traveled to the end of the rainbow As Dame Fortune did intend, Murphy would be there to tell me The pot's at the other end.