Re: socket time out in fio benchmark

"shuau li" <lishuai_ujs@xxxxxxx> · Fri, 14 Nov 2014 10:12:30 +0800 (CST)

Hi all,

    Now I try glusterfs 3.6.0, use the same configuration to do test. But the situation has not improved, several hours later, bricks' log still says "socket disconnection".
     I doubt  this problem may be caused by epoll, so I patch system using "http://review.gluster.org/#/c/3842/13//COMMIT_MSG", but it seems do 
not work.

    shuai li

在 2014-11-11 13:58:20，"Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx> 写道：

    On 11/11/2014 11:21 AM, shuau li wrote:

          Hi everyone, 

                  I am trying to run
              multipal jobs using fio benchamark in replica volume

              with 3 bricks, but some
              hours later, warning message “W
              [socket.c:195:__socket_rwv]

              0-tcp.ida-server: readv
              failed (Connection timed out)” appear in bricks logs, I
              think this

              waring may due to high
              work loads, glusterfs with high work loads can not respond

              socket timely. So I add
              codes in rpc/rpc-transport/socket/src/socket.c to expand
              timeout

              threshold of socket, now
              the SO_RCVTIMEO is 180s, KEEP_ALIVE is 300s, then run

              work load again, but it
              does not work.

          My test enviroment is as
              follow:

                  Three nodes work as
              gluster cluster, each nodes with 16GB memory, 8 core

              3.3GHz cpu，
                two 10000baseT/full
              and one
              1000baseT/full network cards, each nodes

              use 16 * 2T raid5 disks
              working as brick. The glusterfs version is 3.3.1.

    Just wanted to know if there is a possibility of
      using newer versions. No new releases are being made from 3.3.x.
      Could you try the case on newer versions and let us know your
      findings? If for some reasons you can't upgrade then we can
      probably start debugging this in 3.3.x. But I highly recommend you
      use a version that is active i.e. at least 3.4.x

      Pranith

                  I create a 1*3 replica
              volume use this three nodes, every node use fuse to mount

              volume through a
              10000baseT/full network card. At the sametime, every node
              use cifs to

              mount fuse_mount_point
              through another 10000baseT/full card.

                  Each node run two fio
              scripts, read and write jobs. Both scripts do operation in

              cifs_mount_point. scripts
              is as follows:

              write_jobs:

              while true

              do

              mkdir -p
              ${DIR}_write_${i}

              /usr/local/bin/fio
              --ioengine=libaio --iodepth=256 --numjobs=100 --rw=write
              --bs=1k
              --size=1000m --directory=${DIR}_write_${i}
              --name=job01_1k_write >>
              ${DIR}_write_${i}/job01_1k_write.log 

          i=`expr $i + 1`

              done

          read jobs:

              mkdir -p ${DIR}_read_${i}

              /usr/local/bin/fio
              --ioengine=libaio --iodepth=256 --numjobs=100 --rw=read
              --bs=1k
              --size=1000m --directory=${DIR}_read_${i}
              --name=job01_1k_read >>
              ${DIR}_read_${i}/job01_1k_read.log

              i=`expr $i + 1`

              done 

                  I change iodepth from 256
              to 16, numjobs from 100 to 25, but it still does not 

              work. Is there anybody pay
              attention to this problem?

      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users