Anand Avati wrote:
Teo,
If you are using glusterfs--mainline--2.4, please add to the
write-behind
'option flush-behind off' and if needed 'option transport-timeout
<secs>' to
protocol/client volumes (where <secs> is sufficiently large enough). the
optoin flush-behind should most likely fix your error.
Instead you could just tla update to the latest patch (patch-184) and the
errors should disappear.
Added 'option flush-behind off' and 'option transport-timeout 20'
This time it didn't crash right from the begining ... 2 updates and 1
vacuum were OK.
glu=# update animal set observatii='ok1';
UPDATE 713268
glu=# update animal set observatii='ok2';
UPDATE 713268
glu=# vacuum;
VACUUM
glu=# update animal set observatii='ok3';
ERROR: could not read block 206 of relation
534643271/534643272/534643273: File descriptor in bad state
Thank you for your patience !
Teo
LOGS
Client
[Jun 24 23:19:42] [DEBUG/afr.c:65/afr_get_num_copies()] afr:matched!
pattern = *, filename = 534643273,
[Jun 24 23:19:42] [DEBUG/afr.c:65/afr_get_num_copies()] afr:matched!
pattern = *, filename = 534643276,
[Jun 24 23:19:42] [DEBUG/afr.c:65/afr_get_num_copies()] afr:matched!
pattern = *, filename = 534643278,
[Jun 24 23:19:42] [DEBUG/afr.c:65/afr_get_num_copies()] afr:matched!
pattern = *, filename = 534643281,
[Jun 24 23:19:42] [DEBUG/afr.c:65/afr_get_num_copies()] afr:matched!
pattern = *, filename = 534643285,
[Jun 24 23:19:42] [DEBUG/afr.c:65/afr_get_num_copies()] afr:matched!
pattern = *, filename = 534643287,
[Jun 24 23:20:30] [ERROR/common-utils.c:55/full_rw()]
libglusterfs:full_rw: 56359 bytes r/w instead of 131158 (errno=104)
[Jun 24 23:20:30]
[DEBUG/protocol.c:331/gf_block_unserialize_transport()]
libglusterfs/protocol:gf_block_unserialize_transport: full_read of block
failed
[Jun 24 23:20:30]
[DEBUG/client-protocol.c:2609/client_protocol_cleanup()]
protocol/client:cleaning up state in transport object 0x86ca020
[Jun 24 23:20:30] [CRITICAL/tcp.c:81/tcp_disconnect()]
transport/tcp:client1: connection to server disconnected
--------------------------------------------------
Server ( I noticed that server2 and server3 didn't show any CRITICAL
error in their logs, just server1 had problems)
[Jun 24 23:23:28] [ERROR/common-utils.c:110/full_rwv()]
libglusterfs:full_rwv: 98464 bytes r/w instead of 131281 (Connection
reset by peer)
[Jun 24 23:23:28] [ERROR/proto-srv.c:117/generic_reply()]
protocol/server:transport_writev failed
[Jun 24 23:23:28] [ERROR/tcp.c:110/tcp_except()] transport/tcp:shutdown
() - error: Transport endpoint is not connected
[Jun 24 23:23:28] [ERROR/common-utils.c:55/full_rw()]
libglusterfs:full_rw: 129983 bytes r/w instead of 131299 (errno=107)
[Jun 24 23:23:28]
[DEBUG/protocol.c:331/gf_block_unserialize_transport()]
libglusterfs/protocol:gf_block_unserialize_transport: full_read of block
failed
[Jun 24 23:23:28] [DEBUG/proto-srv.c:2826/open_file_cleanup_fn()]
protocol/server:force releaseing file 0x8051d30
[Jun 24 23:23:28] [DEBUG/proto-srv.c:2826/open_file_cleanup_fn()]
protocol/server:force releaseing file 0x8051c90
[Jun 24 23:23:28] [DEBUG/proto-srv.c:2867/proto_srv_cleanup()]
protocol/server:cleaned up xl_private of 0x804b1d0
[Jun 24 23:23:28] [CRITICAL/tcp.c:81/tcp_disconnect()]
transport/tcp:server: connection to server disconnected
[Jun 24 23:23:28] [DEBUG/tcp-server.c:229/gf_transport_fini()]
tcp/server:destroying transport object for 29.72.76.22:1021 (fd=5)