Hello,
We are setting a new cluster of Ceph and doing some benchmarks on it.
At this moment, our cluster consists of:
- 3 nodes for OSD. In our current configuration one daemon per node.
- 3 nodes for monitors (MON). In two of these nodes, there is a metadata server (MDS).
Benchmarks are performed with tools that ceph/rados provides us as well as with fio benchmark tool.
Using fio benchmark tool, we are having some issues. After some executions, the fio process gets stuck with futex_wait_queue_me call:
# cat /proc/14413/stack
[<ffffffffa7af6622>] futex_wait_queue_me+0xd2/0x140
[<ffffffffa7af74bf>] futex_wait+0xff/0x260
[<ffffffffa7aa3a6d>] wake_up_q+0x2d/0x60
[<ffffffffa7af7d11>] futex_requeue+0x2c1/0x930
[<ffffffffa7af8fd1>] do_futex+0x2b1/0xb20
[<ffffffffa7badfb1>] handle_mm_fault+0x14e1/0x1cd0
[<ffffffffa7aa48e8>] wake_up_new_task+0x108/0x1a0
[<ffffffffa7af98c3>] SyS_futex+0x83/0x180
[<ffffffffa7a63981>] __do_page_fault+0x221/0x510
[<ffffffffa7fda736>] system_call_fast_compare_end+0xc/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
Logs of osd and mon daemons do not show any information or error about what the problem could be.
Executing strace command to trace the execution of the fio process show the following:
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 632809, {1475609725, 98199000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609725, 98347}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 345690227}) = 0
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 632811, {1475609725, 348199000}, ffffffff <unfinished ...>
[pid 14429] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 14429] clock_gettime(CLOCK_REALTIME, {1475609725, 127563261}) = 0
[pid 14429] futex(0x7cefc8, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 14429] futex(0x7cf01c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 79103, {1475609727, 127563261}, ffffffff <unfinished ...>
[pid 14416] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609725, 348403}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 595788486}) = 0
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 632813, {1475609725, 598199000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609725, 598360}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 845712817}) = 0
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 632815, {1475609725, 848199000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609725, 848353}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125064, 95705677}) = 0
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 632817, {1475609726, 98199000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609726, 98359}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125064, 345711731}) = 0
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 632819, {1475609726, 348199000}, ffffffff <unfinished ...>
[pid 14418] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 14418] futex(0x7c1f08, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 14418] clock_gettime(CLOCK_REALTIME, {1475609726, 103526543}) = 0
[pid 14418] futex(0x7c1f5c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 31641, {1475609731, 103526543}, ffffffff <unfinished ...>
[pid 14419] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
....
[pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 730557149}) = 0
[pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 730727417}) = 0
[pid 14423] futex(0x7c8c34, FUTEX_CMP_REQUEUE_PRIVATE, 1,
2147483647, 0x7c8b60, 15902 <unfinished ...>
[pid 14425] <... futex resumed> ) = 0
[pid 14425] futex(0x7c8b60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 14423] <... futex resumed> ) = 1
[pid 14423] futex(0x7c8b60, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 14425] <... futex resumed> ) = 0
[pid 14425] futex(0x7c8b60, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 14425] clock_gettime(CLOCK_REALTIME, {1475609728, 731160249}) = 0
[pid 14425] sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, {"\200\4\364W\271\236\224+", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 9
[pid 14425] futex(0x7c8c34, FUTEX_WAIT_PRIVATE, 15903, NULL <unfinished ...>
[pid 14423] <... futex resumed> ) = 1
[pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 731811246}) = 0
[pid 14423] futex(0x775430, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 14423] futex(0x775494, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 15823, {1475609738, 731811246}, ffffffff <unfinished ...>
[pid 14426] <... restart_syscall resumed> ) = 1
[pid 14426] recvfrom(3, "\17\200\4\364W\271\236\224+", 4096, MSG_DONTWAIT, NULL, NULL) = 9
[pid 14426] clock_gettime(CLOCK_REALTIME, {1475609728, 732608460}) = 0
[pid 14426] poll([{fd=3, events=POLLIN|0x2000}], 1, 900000 <unfinished ...>
[pid 14417] <... futex resumed> ) = 0
[pid 14417] futex(0x771e28, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 14417] futex(0x771eac, FUTEX_WAIT_PRIVATE, 32223, NULL <unfinished ...>
[pid 14416] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
This issue has appeared in our two clients. These two clients are running Debian Jessie, each one with a different kernel:
- kernel 3.16.7-ckt25-2+deb8u3
- kernel 4.7.2-1~bpo8+1
And the following version of the packages have been used in both clients:
- Ceph cluster 10.2.2 & FIO 2.1.11-2
- Ceph cluster 10.2.3 & FIO 2.1.11-2
- Ceph cluster 10.2.3 & FIO 2.14
We launch fio tool varying different settings such block size and operation type.
This is a simplified snippet of the shell script used:
for operation in read write randread randwrite; do
for rbd in 4K 64K 1M 4M; do
for bs in 4k 64k 1M 4M ; do
# create rbd image with block size $rbd
# drop caches
fio --name=global \
--ioengine=rbd \
--clientname=admin \
--pool=scbench \
--rbdname=image01 \
--bs=${bs} \
--name=rbd_iodeph32 \
--iodepth=32 \
--rw=${operation} \
--output-format=json
sleep 10
# delete rbd image
done
done
done
Any ideas why it could be happening ? Are we missing some settings in fio tool ?
Regards,