Hi everyone, Hope you guys can help me or at least point me to the good direction. I'm doing some stress tests in a 16 disk striped setup, using 300Mbyte files, created with "dd if=/dev/zero of=smalltest.file bs=1048576 count=300". The tests consist on reading those files, in a loop, in the same manner with "dd of=/dev/null if=smalltest.file bs=1048576 count=300". Both server and client machines are a dual-core Intel Xeon connected by a 10Gbit link and the OS is Linux Ubuntu 10.10 with a kernel 2.6.36.2. The tests were done with a fresh compiled git version of GlusterFS (v3.1.1-52-gcbba1c3). After some time (always after a couple hours), while GlusterFS is working great during that time, only reading the created files, one of the glusterfsd threads CPU usage goes up to 340% (uses almost all of the 4-core) and seems to become unresponsive. Is this a known issue or is there a mistake from my part (configs, etc)? On the server logs, there isn't any relevant information (attached file) but on the client logs I have the following (also attached): [2011-01-05 20:43:41.258672] E [client-handshake.c:116:rpc_client_ping_timer_expired] stripe1-client-8: Server 10.0.0.1:24033 has not responded in the last 42 seconds, disconnecting. [2011-01-05 20:43:41.271821] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88) [0x7f032d9f6678] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f032d9f5ddd] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f032d9f5d3e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(READ(12)) called at 2011-01-05 20:42:26.310399 [2011-01-05 20:43:41.272034] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88) [0x7f032d9f6678] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f032d9f5ddd] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f032d9f5d3e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(READ(12)) called at 2011-01-05 20:42:26.310523 [2011-01-05 20:43:41.272088] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88) [0x7f032d9f6678] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f032d9f5ddd] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f032d9f5d3e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(READ(12)) called at 2011-01-05 20:42:26.310556 [2011-01-05 20:43:41.272140] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88) [0x7f032d9f6678] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f032d9f5ddd] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f032d9f5d3e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(READ(12)) called at 2011-01-05 20:42:26.310575 [2011-01-05 20:43:41.272191] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88) [0x7f032d9f6678] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f032d9f5ddd] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f032d9f5d3e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(READ(12)) called at 2011-01-05 20:42:26.310595 [2011-01-05 20:43:41.272241] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88) [0x7f032d9f6678] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f032d9f5ddd] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f032d9f5d3e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(READ(12)) called at 2011-01-05 20:42:26.310618 [2011-01-05 20:43:41.272292] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88) [0x7f032d9f6678] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f032d9f5ddd] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f032d9f5d3e]))) rpc-clnt: forced unwinding frame type(GlusterFS Handshake) op(PING(3)) called at 2011-01-05 20:42:59.256051 [2011-01-05 20:43:41.272315] I [client.c:1590:client_rpc_notify] stripe1-client-8: disconnected The configuration for each subvolume is: +--------------------------------------------------------------------------- ---+ volume stripe1-posix type storage/posix option directory /data_b end-volume volume stripe1-access-control type features/access-control subvolumes stripe1-posix end-volume volume stripe1-locks type features/locks subvolumes stripe1-access-control end-volume volume stripe1-io-threads type performance/io-threads subvolumes stripe1-locks end-volume volume /data_b type debug/io-stats subvolumes stripe1-io-threads end-volume volume stripe1-server type protocol/server option transport-type tcp option auth.addr./data_b.allow * subvolumes /data_b end-volume +--------------------------------------------------------------------------- ---+ The server/client configuration used is: +--------------------------------------------------------------------------- ---+ 1: volume stripe1-client-0 2: type protocol/client 3: option remote-host 10.0.0.1 4: option remote-subvolume /data_b 5: option transport-type tcp 6: end-volume 7: <repeat 15 times for each subvolume> 112: 113: volume stripe1-stripe-0 114: type cluster/stripe 115: option block-size 1MB 116: subvolumes stripe1-client-0 stripe1-client-1 stripe1-client-2 stripe1-client-3 stripe1-client-4 stripe1-client-5 stripe1-client-6 stripe1-client-7 stripe1-client-8 stripe1-client-9 stripe1-client-10 stripe1-client-11 stripe1-client-12 stripe1-client-13 stripe1-client-14 stripe1-client-15 117: end-volume 118: 119: volume stripe1-write-behind 120: type performance/write-behind 121: subvolumes stripe1-stripe-0 122: end-volume 123: 124: volume stripe1-read-ahead 125: type performance/read-ahead 126: option page-count 128 127: option page-size 8388608 128: subvolumes stripe1-write-behind 129: end-volume 130: 131: volume stripe1-io-cache 132: type performance/io-cache 133: option cache-size 1GB 134: subvolumes stripe1-read-ahead 135: end-volume 136: 137: volume stripe1-quick-read 138: type performance/quick-read 139: subvolumes stripe1-io-cache 140: end-volume 141: 142: volume stripe1-stat-prefetch 143: type performance/stat-prefetch 144: subvolumes stripe1-quick-read 145: end-volume 146: 147: volume stripe1 148: type debug/io-stats 149: subvolumes stripe1-stat-prefetch 150: end-volume +--------------------------------------------------------------------------- ---+ TIA, Daniel -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: glusterclientlog.txt URL: <http://gluster.org/pipermail/gluster-users/attachments/20110106/fd8dec83/attachment-0002.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: glusterserverlog.txt URL: <http://gluster.org/pipermail/gluster-users/attachments/20110106/fd8dec83/attachment-0003.txt>