Gluster IO thread unresponsive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everyone,

 Hope you guys can help me or at least point me to the good direction.

 I'm doing some stress tests in a 16 disk striped setup, using 300Mbyte
files, created with "dd if=/dev/zero of=smalltest.file bs=1048576
count=300". The tests consist on reading those files, in a loop, in the same
manner with "dd of=/dev/null if=smalltest.file bs=1048576 count=300".
Both server and client machines are a dual-core Intel Xeon connected by a
10Gbit link and the OS is Linux Ubuntu 10.10 with a kernel 2.6.36.2. The
tests were done with a fresh compiled git version of GlusterFS
(v3.1.1-52-gcbba1c3).

After some time (always after a couple hours), while GlusterFS is working
great during that time, only reading the created files, one of the
glusterfsd threads CPU usage goes up to 340% (uses almost all of the 4-core)
and seems to become unresponsive. 

Is this a known issue or is there a mistake from my part (configs, etc)?

On the server logs, there isn't any relevant information (attached file) but
on the client logs I have the following (also attached):

[2011-01-05 20:43:41.258672] E
[client-handshake.c:116:rpc_client_ping_timer_expired] stripe1-client-8:
Server 10.0.0.1:24033 has not responded in the last 42 seconds,
disconnecting.
[2011-01-05 20:43:41.271821] E [rpc-clnt.c:338:saved_frames_unwind]
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88) [0x7f032d9f6678]
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d)
[0x7f032d9f5ddd] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)
[0x7f032d9f5d3e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1)
op(READ(12)) called at 2011-01-05 20:42:26.310399
[2011-01-05 20:43:41.272034] E [rpc-clnt.c:338:saved_frames_unwind]
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88) [0x7f032d9f6678]
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d)
[0x7f032d9f5ddd] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)
[0x7f032d9f5d3e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1)
op(READ(12)) called at 2011-01-05 20:42:26.310523
[2011-01-05 20:43:41.272088] E [rpc-clnt.c:338:saved_frames_unwind]
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88) [0x7f032d9f6678]
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d)
[0x7f032d9f5ddd] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)
[0x7f032d9f5d3e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1)
op(READ(12)) called at 2011-01-05 20:42:26.310556
[2011-01-05 20:43:41.272140] E [rpc-clnt.c:338:saved_frames_unwind]
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88) [0x7f032d9f6678]
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d)
[0x7f032d9f5ddd] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)
[0x7f032d9f5d3e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1)
op(READ(12)) called at 2011-01-05 20:42:26.310575
[2011-01-05 20:43:41.272191] E [rpc-clnt.c:338:saved_frames_unwind]
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88) [0x7f032d9f6678]
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d)
[0x7f032d9f5ddd] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)
[0x7f032d9f5d3e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1)
op(READ(12)) called at 2011-01-05 20:42:26.310595
[2011-01-05 20:43:41.272241] E [rpc-clnt.c:338:saved_frames_unwind]
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88) [0x7f032d9f6678]
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d)
[0x7f032d9f5ddd] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)
[0x7f032d9f5d3e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1)
op(READ(12)) called at 2011-01-05 20:42:26.310618
[2011-01-05 20:43:41.272292] E [rpc-clnt.c:338:saved_frames_unwind]
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88) [0x7f032d9f6678]
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d)
[0x7f032d9f5ddd] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)
[0x7f032d9f5d3e]))) rpc-clnt: forced unwinding frame type(GlusterFS
Handshake) op(PING(3)) called at 2011-01-05 20:42:59.256051
[2011-01-05 20:43:41.272315] I [client.c:1590:client_rpc_notify]
stripe1-client-8: disconnected


The configuration for each subvolume is:
+---------------------------------------------------------------------------
---+
volume stripe1-posix
    type storage/posix
    option directory /data_b
end-volume

volume stripe1-access-control
    type features/access-control
    subvolumes stripe1-posix
end-volume

volume stripe1-locks
    type features/locks
    subvolumes stripe1-access-control
end-volume

volume stripe1-io-threads
    type performance/io-threads
    subvolumes stripe1-locks
end-volume

volume /data_b
    type debug/io-stats
    subvolumes stripe1-io-threads
end-volume

volume stripe1-server
    type protocol/server
    option transport-type tcp
    option auth.addr./data_b.allow *
    subvolumes /data_b
end-volume
+---------------------------------------------------------------------------
---+



The server/client configuration used is:
+---------------------------------------------------------------------------
---+
  1: volume stripe1-client-0
  2:     type protocol/client
  3:     option remote-host 10.0.0.1
  4:     option remote-subvolume /data_b
  5:     option transport-type tcp
  6: end-volume
  7: 
<repeat 15 times for each subvolume>
112: 
113: volume stripe1-stripe-0
114:     type cluster/stripe
115:     option block-size 1MB
116:     subvolumes stripe1-client-0 stripe1-client-1 stripe1-client-2
stripe1-client-3 stripe1-client-4 stripe1-client-5 stripe1-client-6
stripe1-client-7 stripe1-client-8 stripe1-client-9 stripe1-client-10
stripe1-client-11 stripe1-client-12 stripe1-client-13 stripe1-client-14
stripe1-client-15
117: end-volume
118: 
119: volume stripe1-write-behind
120:     type performance/write-behind
121:     subvolumes stripe1-stripe-0
122: end-volume
123: 
124: volume stripe1-read-ahead
125:     type performance/read-ahead
126:     option page-count 128
127:     option page-size 8388608
128:     subvolumes stripe1-write-behind
129: end-volume
130: 
131: volume stripe1-io-cache
132:     type performance/io-cache
133:     option cache-size 1GB
134:     subvolumes stripe1-read-ahead
135: end-volume
136: 
137: volume stripe1-quick-read
138:     type performance/quick-read
139:     subvolumes stripe1-io-cache
140: end-volume
141: 
142: volume stripe1-stat-prefetch
143:     type performance/stat-prefetch
144:     subvolumes stripe1-quick-read
145: end-volume
146: 
147: volume stripe1
148:     type debug/io-stats
149:     subvolumes stripe1-stat-prefetch
150: end-volume

+---------------------------------------------------------------------------
---+

TIA,
Daniel
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: glusterclientlog.txt
URL: <http://gluster.org/pipermail/gluster-users/attachments/20110106/fd8dec83/attachment-0002.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: glusterserverlog.txt
URL: <http://gluster.org/pipermail/gluster-users/attachments/20110106/fd8dec83/attachment-0003.txt>


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux