Re: Unify lockup with infinite "bailing transport"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andrey,
 can you send your complete log file, especially the section which logs
about the frames forcefully unwound after the bail..

avati

On 08/06/2008, NovA <av.nova@xxxxxxxxx> wrote:
>
> Hello everybody!
>
> I've using GlusterFS for rather long time. It's a great project! Thanks!
> My old GlusterFS 1.2tla184 is a rock stable, but with new 1.3.x series
> I still have problems :(. Here there is a bunch of them for the
> version 1.3.9tla772.
>
> I use unify over 24 bricks, each on different cluster node. Each node
> runs glusterfs server (exporting local HDD) and client (mounting
> glusterFS unify at /home) as different processes. Server xlators are:
> storage/posix (and name-space on head node) -> features/posix-locks ->
> tcp/server. Client consists of: tcp/clients -> cluster/unify NUFA
> (except the head node with ALU) -> performance/write-behind. Each node
> runs openSUSE 10.3 with kernel 2.6.22.17-0.1 x86_64, fuse-2.7.2glfs9.
>
> 1) The most annoying problem is a complete glusterFS lockup. It became
> apparent in real-world usage by multiple users. At random moment any
> attempt to access glusterFS on head node (name-space, ALU unify) fails
> and the client.log is flooded by messages like
> --------
> 2008-06-03 17:11:06 W [client-protocol.c:205:call_bail] c36:
> activating bail-out. pending frames = 3. last sent = 2008-06-03
> 17:10:23. last received = 2008-06-03 17:10:23 transport-timeout = 42
> 2008-06-03 17:11:06 C [client-protocol.c:212:call_bail] c36: bailing
> transport
> 2008-06-03 17:11:06 W [client-protocol.c:205:call_bail] c45:
> activating bail-out. pending frames = 4. last sent = 2008-06-03
> 17:10:23. last received = 2008-06-03 17:10:23 transport-timeout = 42
> 2008-06-03 17:11:06 C [client-protocol.c:212:call_bail] c45: bailing
> transport
> --------
> repeated infinitely each minute (with node names changing in loop). A
> have two log files of 62Mb and 138Mb filled with such errors (they
> were generated when I left the system unattended for a day). Moreover,
> when the glusterfs enters in such a regime it can't be killed, even
> with "killall -9 glusterfs". But on the other cluster nodes (with NUFA
> unify) logs are free from these messages, and it is possible to access
> unify FS without lockup.
>
> I can't identify the initial cause of the lockup. Once it happened
> just after I switched off one of the bricks. But most of the times
> there is no any unusual actions on FS, just file/dir creation and
> coping/moving. Logs are too huge and full of other errors (see
> bellow), to find the cause. BTW, what does this message mean? :)
>
> 2) The second problem is already mentioned in the mailing list -
> sometimes files are double created on bricks. And the file became
> inaccessible until I delete one copy. Can this be done automatically?
>
> 3) My logs are also full of the following error:
> -----
> 2008-06-02 16:03:33 E [unify.c:325:unify_lookup] bricks: returning
> ESTALE for / [translator generation (25) inode generation (23)]
> 2008-06-02 16:03:33 E [fuse-bridge.c:459:fuse_entry_cbk]
> glusterfs-fuse: 301: (34) / => -1 (116)
> 2008-06-02 16:03:33 E [unify.c:325:unify_lookup] bricks: returning
> ESTALE for / [translator generation (25) inode generation (23)]
> 2008-06-02 16:03:33 E [fuse-bridge.c:459:fuse_entry_cbk]
> glusterfs-fuse: 302: (34) / => -1 (116)
> -----
> This error happens when the glusterFS mount point is touched somehow
> (for example "ls /home"), but not the subdirs. Despite of the error
> such an operation succeeds, but with a lag.
> It seems, that this is somehow connected with the non-simultaneous
> start of the cluster nodes (namely their glusterfs servers). When all
> nodes are up, the remount of the glusterfs helps to get rid of the
> mentioned error.
>
> Hope these problems can be resolved...
>
> With best regards,
>   Andrey
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>



-- 
If I traveled to the end of the rainbow
As Dame Fortune did intend,
Murphy would be there to tell me
The pot's at the other end.


[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux