RE: Unify/AFR crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Anand, here's my logs when trying to access 2 files with I/O errors. I can't
read them through the mount point but I can read them through export
directory. It's not just 2 files with I/O errors... it's much and much more.

There were some more logs but I think that I removed it. It was full of
activating bail-out and almost like Nova's. 

P.S.: Nova these crashes lead to I/O errors and NS will crash every time
you're trying to access these files. Restarting isn't helping... or maybe
that's just me with so many files so it goes so bad. 
Even so... I tried to make a new directory and put "problem files" into it
but I got the same thing and it's very strange "open on child node success -
failed on namespace". 
Hope it will help to fix the bug. Anyway I'm waiting for Krishna in IRC now,
hope he will help also. 

http://gluster.pastebin.com/m68290d74


-----Original Message-----
From: gluster-devel-bounces+kotdv=intergam.com@xxxxxxxxxx
[mailto:gluster-devel-bounces+kotdv=intergam.com@xxxxxxxxxx] On Behalf Of
Anand Avati
Sent: Monday, August 11, 2008 2:32 PM
To: NovA
Cc: Gluster Developers Discussion List
Subject: Re: Unify/AFR crashes

can you please repost the logs "with" the timestamps?

avati

2008/8/11 NovA <av.nova@xxxxxxxxx>

> Hi!
>
> I'm using glusterfs-1.3.9tla790 unify (without AFR) and also
> periodically has troubles with NS. Sometimes, when many files are
> copied to unify volume, the operation is stalled. And the client log
> is flooded by messages like:
> W [client-protocol.c:332:client_protocol_xfer] c-ns: not connected at
> the moment to submit frame type(1) op(35)
> The glusterfs server not really crash, but eat 100% CPU. Restarting it
> restores the normal work.
>
> There are also many other messages concerning namespace, like
> ------
> W [client-protocol.c:1711:client_closedir] c-ns: no proper fd found,
> returning
> E [client-protocol.c:4834:client_protocol_cleanup] c-ns: forced
> unwinding frame type(1) op(34) reply=@0xc165b0
> E [client-protocol.c:4430:client_lookup_cbk] c-ns: no proper reply
> from server, returning ENOTCONN
> E [unify.c:182:unify_lookup_cbk] bricks: c-ns returned 107
> ------
> W [client-protocol.c:205:call_bail] c-ns: activating bail-out. pending
> frames = 3. last sent = 2008-07-28 16:05:38. last received =
> 1970-01-01 03:00:00 transport-timeout = 42
> C [client-protocol.c:212:call_bail] c-ns: bailing transport
> ------
>      Note the strange last received date...
> -----
> W [client-protocol.c:280:client_protocol_xfer] c-ns: attempting to
> pipeline request type(1) op(34) with handshake
> -----
> W [client-protocol.c:4784:client_protocol_cleanup] c-ns: cleaning up
> state in transport object 0x64
> E [client-protocol.c:4834:client_protocol_cleanup] c-ns: forced
> unwinding frame type(1) op(34)  reply=@0x66c140
> E [client-protocol.c:4430:client_lookup_cbk] c-ns: no proper reply
> from server, returning ENOTCONN
> ----
> [client-protocol.c:4784:client_protocol_cleanup] c-ns: cleaning up
> state in transport object 0x64d310
> [client-protocol.c:4834:client_protocol_cleanup] c-ns: forced
> unwinding frame type(1) op(36) reply=@0x2aaab0048720
> [client-protocol.c:4215:client_setdents_cbk] c-ns: no proper reply
> from server, returning ENOTCONN
> [client-protocol.c:4834:client_protocol_cleanup] c-ns: forced
> unwinding frame type(1) op(36) reply=@0x2aaab0048720
> [client-protocol.c:4215:client_setdents_cbk] c-ns: no proper reply
> from server, returning ENOTCONN
> [client-protocol.c:4834:client_protocol_cleanup] c-ns: forced
> unwinding frame type(1) op(23) reply=@0x2aaab0048720
> [client-protocol.c:3310:client_getdents_cbk] c-ns: no proper reply
> from server, returning ENOTCONN
> [client-protocol.c:1711:client_closedir] c-ns: no proper fd found,
> returning
> [client-protocol.c:4834:client_protocol_cleanup] c-ns: forced
> unwinding frame type(1) op(34) reply=@0x2aaab0048720
> [client-protocol.c:4430:client_lookup_cbk] c-ns: no proper reply from
> server, returning ENOTCONN
> [client-protocol.c:4834:client_protocol_cleanup] c-ns: forced
> unwinding frame type(1) op(22) reply=@0x2aaab0048720
> [client-protocol.c:3767:client_opendir_cbk] c-ns: no proper reply from
> server, returning ENOTCONN
> ----
> E [client-protocol.c:1884:client_fstat] c-ns: : returning EBADFD
> E [unify.c:118:unify_buf_cbk] bricks: c-ns returned 77
> ----
>
> All these are from different places of the log (rather huge now), and
> probably not connected to each other. It's just to show different
> types of warnings and errors I've really saw.
>
> The name-space in my case is just a folder on common ext3 volume. It's
> for testing purposes, I'm going to use separate reiserfs volume for it
> later. But probably it's somehow connected with NS problems which I'm
> seeing now.
>
> With best regards,
>  Andrey
>
>
> 2008/8/5 Dmitriy Kotkin <kotdv@xxxxxxxxxxxx>:
> > I'm using glusterfs-1.3.10 (the same for 1.3.9tla787) and unify over
afrs
> > setup.
> > The problem is that during intense fs ops every first node listed in
AFRs
> > crashes (activating boiling transport and so on).
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>



-- 
If I traveled to the end of the rainbow
As Dame Fortune did intend,
Murphy would be there to tell me
The pot's at the other end.
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel






[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux