Hi!
The glusterfsd.log on all nodes are virtually empty, the only entry on 2008-11-25 reads
2008-11-25 03:13:48 E [io-threads.c:273:iot_flush] sc1-ioth: fd context is NULL, returning EBADFD
on all nodes. I don't think that this is related to our problems.
Regards,
Fred
On 25.11.2008, at 13:17, Basavanagowda Kanur wrote: Fred, Can you also provide us server logs? -- gowda
On Tue, Nov 25, 2008 at 4:57 PM, Fred Hucht <fred@xxxxxxxxxxxxxx> wrote: Hi devels! We consider GlusterFS as parallel file server (8 server nodes) for our parallel Opteron cluster (88 nodes, ~500 cores), as well as for a unified nufa /scratch distributed over all nodes. We use the cluster within a scientific environment (theoretical physics) and use Scientific Linux with kernel 2.6.25.16. After similar problems with 1.3.x we installed 1.4.0qa61 and set up a /scratch for testing using the following script "glusterconf.sh" which runs local on all nodes on startup and writes the two config files /usr/local/etc/glusterfs-{server,client}.vol: ---------------------------------- 8< snip >8 ---------------------------------- #!/bin/sh HOST=$(hostname -s) if [ $HOST = master ];then MASTER_IP=127.0.0.1 HOST_IP=127.0.0.1 HOST_N=0 else MASTER_IP=192.168.1.254 HOST_IP=$(hostname -i) HOST_N=${HOST_IP##*.} fi LOCAL=sc$HOST_N ################################################################### # write /usr/local/etc/glusterfs-server.vol { cat <<EOF ### ### Server config automatically created by $PWD/$0 ### EOF if [ $HOST = master ];then SERVERVOLUMES="scns" cat <<EOF volume scns type storage/posix option directory /export/scratch_ns end-volume EOF else # if master SERVERVOLUMES="" fi # if master SERVERVOLUMES="$SERVERVOLUMES $LOCAL" cat <<EOF volume $LOCAL-posix type storage/posix option directory /export/scratch end-volume volume $LOCAL-locks type features/posix-locks subvolumes $LOCAL-posix end-volume volume $LOCAL-ioth type performance/io-threads option thread-count 4 subvolumes $LOCAL-locks end-volume volume $LOCAL type performance/read-ahead subvolumes $LOCAL-ioth end-volume volume server type protocol/server option transport-type tcp/server subvolumes $SERVERVOLUMES EOF for vol in $SERVERVOLUMES;do cat <<EOF option auth.addr.$vol.allow 127.0.0.1,192.168.1.* EOF done cat <<EOF end-volume EOF } > /usr/local/etc/glusterfs-server.vol ################################################################### # write /usr/local/etc/glusterfs-client.vol { cat <<EOF ### ### Client config automatically created by $PWD/$0 ### volume scns type protocol/client option transport-type tcp/client option remote-host $MASTER_IP option remote-subvolume scns end-volume volume sc0 type protocol/client option transport-type tcp/client option remote-host $MASTER_IP option remote-subvolume sc0 end-volume EOF UNIFY="sc0" # leave out node66 at the moment... for n in $(seq 65) $(seq 67 87);do VOL=sc$n UNIFY="$UNIFY $VOL" cat <<EOF volume $VOL type protocol/client option transport-type tcp/client option remote-host 192.168.1.$n option remote-subvolume $VOL end-volume EOF done cat <<EOF volume scratch type cluster/unify subvolumes $UNIFY option namespace scns option scheduler nufa option nufa.limits.min-free-disk 15 option nufa.refresh-interval 10 option nufa.local-volume-name $LOCAL end-volume volume scratch-io-threads type performance/io-threads option thread-count 4 subvolumes scratch end-volume volume scratch-write-behind type performance/write-behind option aggregate-size 128kB option flush-behind off subvolumes scratch-io-threads end-volume volume scratch-read-ahead type performance/read-ahead option page-size 128kB # unit in bytes option page-count 2 # cache per file = (page-count x page-size) subvolumes scratch-write-behind end-volume volume scratch-io-cache type performance/io-cache option cache-size 64MB option page-size 512kB subvolumes scratch-read-ahead end-volume EOF } > /usr/local/etc/glusterfs-client.vol ---------------------------------- 8< snip >8 ---------------------------------- The cluster uses MPI over Infiniband, while GlusterFS runs over TCP/IP Gigabit Ethernet. I use FUSE 2.7.4 with patch fuse-2.7.3glfs10.diff (Is that OK? The patch succeeded) Everything is fine until some nodes which are used by a job block on access to /scratch or, sometimes later, give df: `/scratch': Transport endpoint is not connected The glusterfs.log on node36 is flooded by 2008-11-25 07:30:35 E [client-protocol.c:243:call_bail] sc70: activating bail-out. pending frames = 3. last sent = 2008-11-25 07:29:52. last received = 2008-11-25 07:29:49. transport-timeout = 42 2008-11-25 07:30:35 C [client-protocol.c:250:call_bail] sc70: bailing transport ...(~100MB) (~2 lines for every node every 10 seconds) Furthermore, I find at the end of glusterfs.log: grep -v call_bail glusterfs.log ... 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] sc0: transport not connected to submit (priv->connected = 255) ... 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] sc87: transport not connected to submit (priv->connected = 255) 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] scns: transport not connected to submit (priv->connected = 255) 2008-11-25 10:05:03 E [fuse-bridge.c:1886:fuse_statfs_cbk] glusterfs-fuse: 1353: ERR => -1 (Transport endpoint is not connected) On node68 I find 2008-11-24 23:20:12 W [client-protocol.c:93:this_ino_set] sc0: inode number(201326854) changed for inode(0x6130d0) 2008-11-24 23:20:12 W [client-protocol.c:93:this_ino_set] scns: inode number(37749030) changed for inode(0x6130d0) 2008-11-24 23:20:58 E [client-protocol.c:243:call_bail] scns: activating bail-out. pending frames = 3. last sent = 2008-11-24 23:20:12. last received = 2008-11-24 23:20:12. transport-timeout = 42 2008-11-24 23:20:58 C [client-protocol.c:250:call_bail] scns: bailing transport 2008-11-24 23:20:58 E [client-protocol.c:243:call_bail] sc0: activating bail-out. pending frames = 3. last sent = 2008-11-24 23:20:12. last received = 2008-11-24 23:20:12. transport-timeout = 42 2008-11-24 23:20:58 C [client-protocol.c:250:call_bail] sc0: bailing transport ...(~100MB) only for scns and sc0 and then 2008-11-25 10:01:31 E [client-protocol.c:243:call_bail] sc1: activating bail-out. pending frames = 1. last sent = 2008-11-25 10:00:46. last received = 2008-11-24 23:20:12. transport-timeout = 42 2008-11-25 10:01:31 C [client-protocol.c:250:call_bail] sc1: bailing transport ...(~100MB) for all nodes, as well as 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] sc0: transport not connected to submit (priv->connected = 255) 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] scns: transport not connected to submit (priv->connected = 255) 2008-11-25 11:23:18 E [socket.c:1187:socket_submit] sc1: transport not connected to submit (priv->connected = 255) 2008-11-25 11:23:18 E [socket.c:1187:socket_submit] sc2: transport not connected to submit (priv->connected = 255) ... The third affected node node77 says: 2008-11-24 22:07:20 W [client-protocol.c:93:this_ino_set] sc0: inode number(201326854) changed for inode(0x7f97d6c0ac70) 2008-11-24 22:07:20 W [client-protocol.c:93:this_ino_set] scns: inode number(37749030) changed for inode(0x7f97d6c0ac70) 2008-11-24 22:08:07 E [client-protocol.c:243:call_bail] sc10: activating bail-out. pending frames = 7. last sent = 2008-11-24 22:07:24. last received = 2008-11-24 22:07:20. transport-timeout = 42 2008-11-24 22:08:07 C [client-protocol.c:250:call_bail] sc10: bailing transport ...(~100MB) and then 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] sc0: transport not connected to submit (priv->connected = 255) ... 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] sc87: transport not connected to submit (priv->connected = 255) 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] scns: transport not connected to submit (priv->connected = 255) As I said, similar problems occurred with version 1.3.x. If these problems cannot be solved, we have to use a different file system, so any help is very appreciated. Have fun, Fred Dr. Fred Hucht <fred@xxxxxxxxxxxxxx> Institute for Theoretical Physics University of Duisburg-Essen, 47048 Duisburg, Germany _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxx http://lists.nongnu.org/mailman/listinfo/gluster-devel
-- hard work often pays off after time, but laziness always pays off now
Institute for Theoretical Physics University of Duisburg-Essen, 47048 Duisburg, Germany
|