Re: GlusterFS hangs/fails: Transport endpoint is not connected

Fred Hucht <fred@xxxxxxxxxxxxxx> · Tue, 25 Nov 2008 13:35:01 +0100

Hi!

The glusterfsd.log on all nodes are virtually empty, the only entry on 2008-11-25 reads

2008-11-25 03:13:48 E [io-threads.c:273:iot_flush] sc1-ioth: fd context is NULL, returning EBADFD

on all nodes. I don't think that this is related to our problems.

Regards,

     Fred

On 25.11.2008, at 13:17, Basavanagowda Kanur wrote:

Fred,
  Can you also provide us server logs?

--
gowda

On Tue, Nov 25, 2008 at 4:57 PM, Fred Hucht <fred@xxxxxxxxxxxxxx> wrote:
 Hi devels!

 We consider GlusterFS as parallel file server (8 server nodes) for our parallel Opteron cluster (88 nodes, ~500 cores), as well as for a unified nufa /scratch distributed over all nodes. We use the cluster within a scientific environment (theoretical physics) and use Scientific Linux with kernel 2.6.25.16. After similar problems with 1.3.x we installed 1.4.0qa61 and set up a /scratch for testing using the following script "glusterconf.sh" which runs local on all nodes on startup and writes the two config files /usr/local/etc/glusterfs-{server,client}.vol:

 ---------------------------------- 8< snip >8 ----------------------------------
 #!/bin/sh

 HOST=$(hostname -s)

 if [ $HOST = master ];then
    MASTER_IP=127.0.0.1
    HOST_IP=127.0.0.1
    HOST_N=0
 else
    MASTER_IP=192.168.1.254
    HOST_IP=$(hostname -i)
    HOST_N=${HOST_IP##*.}
 fi

 LOCAL=sc$HOST_N

 ###################################################################
 # write /usr/local/etc/glusterfs-server.vol
 {

 cat <<EOF
 ###
 ### Server config automatically created by $PWD/$0
 ###

 EOF

 if [ $HOST = master ];then
    SERVERVOLUMES="scns"
    cat <<EOF
 volume scns
  type storage/posix
  option directory /export/scratch_ns
 end-volume

 EOF
 else # if master
    SERVERVOLUMES=""
 fi   # if master

 SERVERVOLUMES="$SERVERVOLUMES $LOCAL"
 cat <<EOF
 volume $LOCAL-posix
  type storage/posix
  option directory /export/scratch
 end-volume

 volume $LOCAL-locks
  type features/posix-locks
  subvolumes $LOCAL-posix
 end-volume

 volume $LOCAL-ioth
  type performance/io-threads
  option thread-count 4
  subvolumes $LOCAL-locks
 end-volume

 volume $LOCAL
  type performance/read-ahead
  subvolumes $LOCAL-ioth
 end-volume

 volume server
  type protocol/server
  option transport-type tcp/server
  subvolumes $SERVERVOLUMES
 EOF

 for vol in $SERVERVOLUMES;do
    cat <<EOF
  option auth.addr.$vol.allow 127.0.0.1,192.168.1.*
 EOF
 done

 cat <<EOF
 end-volume

 EOF

 } > /usr/local/etc/glusterfs-server.vol

 ###################################################################
 # write /usr/local/etc/glusterfs-client.vol
 {
 cat <<EOF
 ###
 ### Client config automatically created by $PWD/$0
 ###

 volume scns
  type protocol/client
  option transport-type tcp/client
  option remote-host $MASTER_IP
  option remote-subvolume scns
 end-volume

 volume sc0
  type protocol/client
  option transport-type tcp/client
  option remote-host $MASTER_IP
  option remote-subvolume sc0
 end-volume

 EOF

 UNIFY="sc0"

 # leave out node66 at the moment...

 for n in $(seq 65) $(seq 67 87);do
    VOL=sc$n
    UNIFY="$UNIFY $VOL"
        cat <<EOF
 volume $VOL
  type protocol/client
  option transport-type tcp/client
  option remote-host 192.168.1.$n
  option remote-subvolume $VOL
 end-volume

 EOF
 done

 cat <<EOF
 volume scratch
  type cluster/unify
  subvolumes $UNIFY
  option namespace scns
  option scheduler nufa
  option nufa.limits.min-free-disk 15
  option nufa.refresh-interval 10
  option nufa.local-volume-name $LOCAL
 end-volume

 volume scratch-io-threads
  type performance/io-threads
  option thread-count 4
  subvolumes scratch
 end-volume

 volume scratch-write-behind
  type performance/write-behind
  option aggregate-size 128kB
  option flush-behind off
  subvolumes scratch-io-threads
 end-volume

 volume scratch-read-ahead
  type performance/read-ahead
  option page-size 128kB # unit in bytes
  option page-count 2    # cache per file  = (page-count x page-size)
  subvolumes scratch-write-behind
 end-volume

 volume scratch-io-cache
  type performance/io-cache
  option cache-size 64MB
  option page-size 512kB
  subvolumes scratch-read-ahead
 end-volume

 EOF

 } > /usr/local/etc/glusterfs-client.vol
 ---------------------------------- 8< snip >8 ----------------------------------

 The cluster uses MPI over Infiniband, while GlusterFS runs over TCP/IP Gigabit Ethernet. I use FUSE 2.7.4 with patch fuse-2.7.3glfs10.diff (Is that OK? The patch succeeded)

 Everything is fine until some nodes which are used by a job block on access to /scratch or, sometimes later, give

 df: `/scratch': Transport endpoint is not connected

 The glusterfs.log on node36 is flooded by

 2008-11-25 07:30:35 E [client-protocol.c:243:call_bail] sc70: activating bail-out. pending frames = 3. last sent = 2008-11-25 07:29:52. last received = 2008-11-25 07:29:49. transport-timeout = 42
 2008-11-25 07:30:35 C [client-protocol.c:250:call_bail] sc70: bailing transport
 ...(~100MB)

 (~2 lines for every node every 10 seconds) Furthermore, I find at the end of glusterfs.log:

 grep -v call_bail glusterfs.log
 ...
 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] sc0: transport not connected to submit (priv->connected = 255)
 ...
 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] sc87: transport not connected to submit (priv->connected = 255)
 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] scns: transport not connected to submit (priv->connected = 255)
 2008-11-25 10:05:03 E [fuse-bridge.c:1886:fuse_statfs_cbk] glusterfs-fuse: 1353: ERR => -1 (Transport endpoint is not connected)

 On node68 I find

 2008-11-24 23:20:12 W [client-protocol.c:93:this_ino_set] sc0: inode number(201326854) changed for inode(0x6130d0)
 2008-11-24 23:20:12 W [client-protocol.c:93:this_ino_set] scns: inode number(37749030) changed for inode(0x6130d0)
 2008-11-24 23:20:58 E [client-protocol.c:243:call_bail] scns: activating bail-out. pending frames = 3. last sent = 2008-11-24 23:20:12. last received = 2008-11-24 23:20:12. transport-timeout = 42
 2008-11-24 23:20:58 C [client-protocol.c:250:call_bail] scns: bailing transport
 2008-11-24 23:20:58 E [client-protocol.c:243:call_bail] sc0: activating bail-out. pending frames = 3. last sent = 2008-11-24 23:20:12. last received = 2008-11-24 23:20:12. transport-timeout = 42
 2008-11-24 23:20:58 C [client-protocol.c:250:call_bail] sc0: bailing transport
 ...(~100MB)

 only for scns and sc0 and then

 2008-11-25 10:01:31 E [client-protocol.c:243:call_bail] sc1: activating bail-out. pending frames = 1. last sent = 2008-11-25 10:00:46. last received = 2008-11-24 23:20:12. transport-timeout = 42
 2008-11-25 10:01:31 C [client-protocol.c:250:call_bail] sc1: bailing transport
 ...(~100MB)

 for all nodes, as well as

 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] sc0: transport not connected to submit (priv->connected = 255)
 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] scns: transport not connected to submit (priv->connected = 255)
 2008-11-25 11:23:18 E [socket.c:1187:socket_submit] sc1: transport not connected to submit (priv->connected = 255)
 2008-11-25 11:23:18 E [socket.c:1187:socket_submit] sc2: transport not connected to submit (priv->connected = 255)
 ...

 The third affected node node77 says:

 2008-11-24 22:07:20 W [client-protocol.c:93:this_ino_set] sc0: inode number(201326854) changed for inode(0x7f97d6c0ac70)
 2008-11-24 22:07:20 W [client-protocol.c:93:this_ino_set] scns: inode number(37749030) changed for inode(0x7f97d6c0ac70)
 2008-11-24 22:08:07 E [client-protocol.c:243:call_bail] sc10: activating bail-out. pending frames = 7. last sent = 2008-11-24 22:07:24. last received = 2008-11-24 22:07:20. transport-timeout = 42
 2008-11-24 22:08:07 C [client-protocol.c:250:call_bail] sc10: bailing transport
 ...(~100MB)

 and then

 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] sc0: transport not connected to submit (priv->connected = 255)
 ...
 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] sc87: transport not connected to submit (priv->connected = 255)
 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] scns: transport not connected to submit (priv->connected = 255)

 As I said, similar problems occurred with version 1.3.x. If these problems cannot be solved, we have to use a different file system, so any help is very appreciated.

 Have fun,

     Fred

 Dr. Fred Hucht <fred@xxxxxxxxxxxxxx>
 Institute for Theoretical Physics
 University of Duisburg-Essen, 47048 Duisburg, Germany

 _______________________________________________
 Gluster-devel mailing list
 Gluster-devel@xxxxxxxxxx
 http://lists.nongnu.org/mailman/listinfo/gluster-devel

-- 
hard work often pays off after time, but laziness always pays off now

 Dr. Fred Hucht <fred@xxxxxxxxxxxxxx>
Institute for Theoretical Physics
University of Duisburg-Essen, 47048 Duisburg, Germany