Re: NFS reexport works, still stat-prefetch issues, -s problem

Brent A Nelson <brent@xxxxxxxxxxxx> · Thu, 10 May 2007 22:01:18 -0400 (EDT)

On Thu, 10 May 2007, Brent A Nelson wrote:

[May 10 18:14:18] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=115)
[May 10 18:14:18] [CRITICAL/tcp.c:81/tcp_disconnect()] 
transport/tcp:share4-1: connection to server disconnected
[May 10 18:14:18] [CRITICAL/client-protocol.c:218/call_bail()] 
client/protocol:bailing transport
[May 10 18:14:18] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=9)
[May 10 18:14:18] [CRITICAL/tcp.c:81/tcp_disconnect()] 
transport/tcp:share4-0: connection to server disconnected
[May 10 18:14:18] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
protocol/client:transport_submit failed
[May 10 18:14:18] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
protocol/client:transport_submit failed
[May 10 18:14:19] [CRITICAL/client-protocol.c:218/call_bail()] 
client/protocol:bailing transport
[May 10 18:14:19] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=115)
[May 10 18:14:19] [CRITICAL/tcp.c:81/tcp_disconnect()] 
transport/tcp:share4-0: connection to server disconnected
[May 10 18:14:19] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
protocol/client:transport_submit failed

I've seen the "0 bytes r/w instead of 113" message plenty of times in the 
past (with older GlusterFS versions), although it was apparently harmless 
before.  It looks like the code now considers this to be a disconnection and 
tries to reconnect.  For some reason, when it does manage to reconnect, it 
nevertheless results in an I/O error.  I wonder if this relates to a previous 
issue I mentioned with real disconnects (node dies or glusterfsd is 
restarted), where the first access after a failure (at least for ls or df) 
results in an error, but the next attempt succeeds? Seems like an issue with 
the reconnection logic (and some sort of glitch masquerading as a disconnect 
in the first place)... This is probably the real problem that is triggering 
the read-ahead crash (i.e., the read-ahead crash would not be triggered in my 
test case if it weren't for this issue).

Well, it looks like I can reproduce this behavior (but, so far, not the 
memory leak), on a much simpler setup, no NFS required.  I was copying my 
test area (with several 10GB files) to a really simple GlusterFS (one 
share, no afr, no unify, glusterfsd on the same machine), when I hit the 
disconnect issue (after a few files successfully copied).  This looked 
like an issue with protocol/client and/or protocol/server, but I thought 
it would be a good idea to narrow things down a bit...

Thanks,

Brent