Re: [EXTERNAL] Benchmarks using fio tool gets stuck

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Doing some tests using iperf, our network has a bandwidth among nodes of 940 Mbits/sec.
According to our metrics of network use in this cluster, hosts with OSD have a peek traffic of about 200 Mbits/sec each and the client which runs FIO about 300 Mbits/sec.
It doesn't seem to be saturated the network.





On Wed, Oct 5, 2016 at 4:16 PM, Will.Boege <Will.Boege@xxxxxxxxxx> wrote:

Because you do not have segregated networks, the cluster traffic is most likely drowning out the FIO user traffic.  This is especially exacerbated by the fact that it is only a 1gb link between the cluster nodes. 

 

If you are planning on using this cluster for anything other than testing, you’ll want to re-evaluate your network architecture.

 

+  >= 10gbe

+ Dedicated cluster network

 

 

From: Mario Rodríguez Molins <mariorodriguez@xxxxxxxxxx>
Date: Wednesday, October 5, 2016 at 8:38 AM
To: "Will.Boege" <Will.Boege@xxxxxxxxxx>
Cc: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>
Subject: Re: [EXTERNAL] Benchmarks using fio tool gets stuck

 

Hi, 

 

Currently, we do not have a separated cluster network and our setup is:

 - 3 nodes for OSD with 1Gbps links. Each node is running a unique OSD daemon. Although we plan to increase the number of OSDs per host.

 - 3 virtual machines also with 1Gbps links, where each vm is running one monitor daemon (two of them are running a metadata server too). 

 - The two clients used for testing purposes are also 2 vms.

 

In each run of FIO tool, we do the following steps (all of them in the client):

 1.- Create an rbd image of 1Gb within a pool and map this image to a block device

 2.- Create the ext4 filesystem in this block device

 3.- Unmap the device from the client

 4.- Before testing, drop caches (echo 3 | tee /proc/sys/vm/drop_caches && sync)

 5.- Perform the fio test, setting the pool and name of the rbd image. In each run, the block size used is changed.

 6.- Remove the image from the pool

 

 

 

Thanks in advance!

 

On Wed, Oct 5, 2016 at 2:57 PM, Will.Boege <Will.Boege@xxxxxxxxxx> wrote:

What does your network setup look like?  Do you have a separate cluster network?

 

Can you explain how you are performing the FIO test? Are you mounting a volume through krbd and testing that from a different server?


On Oct 5, 2016, at 3:11 AM, Mario Rodríguez Molins <mariorodriguez@xxxxxxxxxx> wrote:

Hello,

 

We are setting a new cluster of Ceph and doing some benchmarks on it. 

At this moment, our cluster consists of:

 - 3 nodes for OSD. In our current configuration one daemon per node.

 - 3 nodes for monitors (MON). In two of these nodes, there is a metadata server (MDS).

 

Benchmarks are performed with tools that ceph/rados provides us as well as with fio benchmark tool.

 

Using fio benchmark tool, we are having some issues. After some executions, the fio process gets stuck with futex_wait_queue_me call:

# cat /proc/14413/stack

[<ffffffffa7af6622>] futex_wait_queue_me+0xd2/0x140

[<ffffffffa7af74bf>] futex_wait+0xff/0x260

[<ffffffffa7aa3a6d>] wake_up_q+0x2d/0x60

[<ffffffffa7af7d11>] futex_requeue+0x2c1/0x930

[<ffffffffa7af8fd1>] do_futex+0x2b1/0xb20

[<ffffffffa7badfb1>] handle_mm_fault+0x14e1/0x1cd0

[<ffffffffa7aa48e8>] wake_up_new_task+0x108/0x1a0

[<ffffffffa7af98c3>] SyS_futex+0x83/0x180

[<ffffffffa7a63981>] __do_page_fault+0x221/0x510

[<ffffffffa7fda736>] system_call_fast_compare_end+0xc/0x96

[<ffffffffffffffff>] 0xffffffffffffffff

 

Logs of osd and mon daemons do not show any information or error about what the problem could be. 

 

Executing strace command to trace the execution of the fio process show the following:

 

[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 632809, {1475609725, 98199000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)

[pid 14416] gettimeofday({1475609725, 98347}, NULL) = 0

[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0

[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 345690227}) = 0

[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 632811, {1475609725, 348199000}, ffffffff <unfinished ...>

[pid 14429] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)

[pid 14429] clock_gettime(CLOCK_REALTIME, {1475609725, 127563261}) = 0

[pid 14429] futex(0x7cefc8, FUTEX_WAKE_PRIVATE, 1) = 0

[pid 14429] futex(0x7cf01c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 79103, {1475609727, 127563261}, ffffffff <unfinished ...>

[pid 14416] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)

[pid 14416] gettimeofday({1475609725, 348403}, NULL) = 0

[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0

[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 595788486}) = 0

[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 632813, {1475609725, 598199000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)

[pid 14416] gettimeofday({1475609725, 598360}, NULL) = 0

[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0

[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 845712817}) = 0

[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 632815, {1475609725, 848199000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)

[pid 14416] gettimeofday({1475609725, 848353}, NULL) = 0

[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0

[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125064, 95705677}) = 0

[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 632817, {1475609726, 98199000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)

[pid 14416] gettimeofday({1475609726, 98359}, NULL) = 0

[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0

[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125064, 345711731}) = 0

[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 632819, {1475609726, 348199000}, ffffffff <unfinished ...>

[pid 14418] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)

[pid 14418] futex(0x7c1f08, FUTEX_WAKE_PRIVATE, 1) = 0

[pid 14418] clock_gettime(CLOCK_REALTIME, {1475609726, 103526543}) = 0

[pid 14418] futex(0x7c1f5c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 31641, {1475609731, 103526543}, ffffffff <unfinished ...>

[pid 14419] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)

....

 

[pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 730557149}) = 0

[pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 730727417}) = 0

[pid 14423] futex(0x7c8c34, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x7c8b60, 15902 <unfinished ...>

[pid 14425] <... futex resumed> )       = 0

[pid 14425] futex(0x7c8b60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>

[pid 14423] <... futex resumed> )       = 1

[pid 14423] futex(0x7c8b60, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>

[pid 14425] <... futex resumed> )       = 0

[pid 14425] futex(0x7c8b60, FUTEX_WAKE_PRIVATE, 1) = 0

[pid 14425] clock_gettime(CLOCK_REALTIME, {1475609728, 731160249}) = 0

[pid 14425] sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, {"\200\4\364W\271\236\224+", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 9

[pid 14425] futex(0x7c8c34, FUTEX_WAIT_PRIVATE, 15903, NULL <unfinished ...>

[pid 14423] <... futex resumed> )       = 1

[pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 731811246}) = 0

[pid 14423] futex(0x775430, FUTEX_WAKE_PRIVATE, 1) = 0

[pid 14423] futex(0x775494, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 15823, {1475609738, 731811246}, ffffffff <unfinished ...>

[pid 14426] <... restart_syscall resumed> ) = 1

[pid 14426] recvfrom(3, "\17\200\4\364W\271\236\224+", 4096, MSG_DONTWAIT, NULL, NULL) = 9

[pid 14426] clock_gettime(CLOCK_REALTIME, {1475609728, 732608460}) = 0

[pid 14426] poll([{fd=3, events=POLLIN|0x2000}], 1, 900000 <unfinished ...>

[pid 14417] <... futex resumed> )       = 0

[pid 14417] futex(0x771e28, FUTEX_WAKE_PRIVATE, 1) = 0

[pid 14417] futex(0x771eac, FUTEX_WAIT_PRIVATE, 32223, NULL <unfinished ...>

[pid 14416] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)

 

 

This issue has appeared in our two clients. These two clients are running Debian Jessie, each one with a different kernel:

 - kernel 3.16.7-ckt25-2+deb8u3
 - kernel 4.7.2-1~bpo8+1

And the following version of the packages have been used in both clients:

- Ceph cluster 10.2.2 & FIO 2.1.11-2 

- Ceph cluster 10.2.3 & FIO 2.1.11-2

- Ceph cluster 10.2.3 & FIO 2.14

 

We launch fio tool varying different settings such block size and operation type.
This is a simplified snippet of the shell script used: 

 

for operation in read write randread randwrite; do                              

  for rbd in 4K 64K 1M 4M; do

    for bs in 4k 64k 1M 4M ; do

      # create rbd image with block size $rbd

      # drop caches

 

      fio --name=global \

      --ioengine=rbd \

      --clientname=admin \

      --pool=scbench \

      --rbdname=image01 \ 

      --bs=${bs} \

      --name=rbd_iodeph32 \
      --iodepth=32 \

      --rw=${operation} \

      --output-format=json

 

      sleep 10
      # delete rbd image

    done

  done

done

 

 

 

Any ideas why it could be happening ? Are we missing some settings in fio tool ?

 

Regards, 

 

 

--

mage removed by sender.
Mario Rodríguez
SRE
mariorodriguez@xxxxxxxxxx

+34 914 294 039 — 645 756 437
C/ Gran Vía, nº 28, 6ª planta — 28013 Madrid 
Tuenti Technologies, S.L.
www.tuenti.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 

--

mage removed by sender.
Mario Rodríguez
SRE
mariorodriguez@xxxxxxxxxx

+34 914 294 039 — 645 756 437
C/ Gran Vía, nº 28, 6ª planta — 28013 Madrid 
Tuenti Technologies, S.L.
www.tuenti.com




--

Mario Rodríguez
SRE
mariorodriguez@xxxxxxxxxx

+34 914 294 039 — 645 756 437
C/ Gran Vía, nº 28, 6ª planta — 28013 Madrid 

Tuenti Technologies, S.L.
www.tuenti.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux