Re: Fwd: Ceph OSD suicide himself

Brad Hubbard <bhubbard@xxxxxxxxxx> · Mon, 11 Jul 2016 13:32:59 +1000



On Mon, Jul 11, 2016 at 1:21 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
> On Mon, Jul 11, 2016 at 11:48:57AM +0900, 한승진 wrote:
>> Hi cephers.
>>
>> I need your help for some issues.
>>
>> The ceph cluster version is Jewel(10.2.1), and the filesytem is btrfs.
>>
>> I run 1 Mon and 48 OSD in 4 Nodes(each node has 12 OSDs).
>>
>> I've experienced one of OSDs was killed himself.
>>
>> Always it issued suicide timeout message.
>>
>> Below is detailed logs.
>>
>>
>> ==================================================================
>> 0. ceph df detail
>> $ sudo ceph df detail
>> GLOBAL:
>>     SIZE       AVAIL      RAW USED     %RAW USED     OBJECTS
>>     42989G     24734G       18138G         42.19      23443k
>> POOLS:
>>     NAME        ID     CATEGORY     QUOTA OBJECTS     QUOTA BYTES     USED
>>       %USED     MAX AVAIL     OBJECTS      DIRTY      READ       WRITE
>>  RAW USED
>>     ha-pool     40     -            N/A               N/A
>>  1405G      9.81         5270G     22986458     22447k          0
>> 22447k        4217G
>>     volumes     45     -            N/A               N/A
>>  4093G     28.57         5270G       933401       911k       648M
>> 649M       12280G
>>     images      46     -            N/A               N/A
>> 53745M      0.37         5270G         6746       6746      1278k
>>  21046         157G
>>     backups     47     -            N/A               N/A
>>  0         0         5270G            0          0          0          0
>>          0
>>     vms         48     -            N/A               N/A
>> 309G      2.16         5270G        79426      79426     92612k     46506k
>>         928G
>>
>> 1. ceph no.15 log
>>
>> *(20:02 first timed out message)*
>> 2016-07-08 20:02:01.049483 7fcd3caa5700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7fcd2c284700' had timed out after 15
>> 2016-07-08 20:02:01.050403 7fcd3b2a2700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7fcd2c284700' had timed out after 15
>> 2016-07-08 20:02:01.086792 7fcd3b2a2700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7fcd2c284700' had timed out after 15
>> .
>> .
>> (sometimes this logs with..)
>> 2016-07-08 20:02:11.379597 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
>> 12 slow requests, 5 included below; oldest blocked for > 30.269577 secs
>> 2016-07-08 20:02:11.379608 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
>> slow request 30.269577 seconds old, received at 2016-07-08 20:01:41.109937:
>> osd_op(client.895668.0:5302745 45.e2e779c2
>> rbd_data.cc460bc7fc8f.00000000000004d8 [stat,write 2596864~516096] snapc
>> 0=[] ack+ondisk+write+known_if_redirected e30969) currently commit_sent
>> 2016-07-08 20:02:11.379612 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
>> slow request 30.269108 seconds old, received at 2016-07-08 20:01:41.110406:
>> osd_op(client.895668.0:5302746 45.e2e779c2
>> rbd_data.cc460bc7fc8f.00000000000004d8 [stat,write 3112960~516096] snapc
>> 0=[] ack+ondisk+write+known_if_redirected e30969) currently waiting for rw
>> locks
>> 2016-07-08 20:02:11.379630 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
>> slow request 30.268607 seconds old, received at 2016-07-08 20:01:41.110907:
>> osd_op(client.895668.0:5302747 45.e2e779c2
>> rbd_data.cc460bc7fc8f.00000000000004d8 [stat,write 3629056~516096] snapc
>> 0=[] ack+ondisk+write+known_if_redirected e30969) currently waiting for rw
>> locks
>> 2016-07-08 20:02:11.379633 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
>> slow request 30.268143 seconds old, received at 2016-07-08 20:01:41.111371:
>> osd_op(client.895668.0:5302748 45.e2e779c2
>> rbd_data.cc460bc7fc8f.00000000000004d8 [stat,write 4145152~516096] snapc
>> 0=[] ack+ondisk+write+known_if_redirected e30969) currently waiting for rw
>> locks
>> 2016-07-08 20:02:11.379636 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
>> slow request 30.267662 seconds old, received at 2016-07-08 20:01:41.111852:
>> osd_op(client.895668.0:5302749 45.e2e779c2
>> rbd_data.cc460bc7fc8f.00000000000004d8 [stat,write 4661248~516096] snapc
>> 0=[] ack+ondisk+write+known_if_redirected e30969) currently waiting for rw
>> locks
>> .
>> .
>> (after a lot of same messages)
>> 2016-07-08 20:03:53.682828 7fcd3caa5700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7fcd2d286700' had timed out after 15
>> 2016-07-08 20:03:53.682828 7fcd3caa5700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7fcd2da87700' had timed out after 15
>> 2016-07-08 20:03:53.682829 7fcd3caa5700  1 heartbeat_map is_healthy
>> 'FileStore::op_tp thread 0x7fcd48716700' had timed out after 60
>> 2016-07-08 20:03:53.682830 7fcd3caa5700  1 heartbeat_map is_healthy
>> 'FileStore::op_tp thread 0x7fcd47f15700' had timed out after 60
>> .
>> .
>> (fault with nothing to send, going to standby massages)
>> 2016-07-08 20:03:53.708665 7fcd15787700  0 -- 10.200.10.145:6818/6462 >>
>> 10.200.10.146:6806/4642 pipe(0x55818727e000 sd=276 :51916 s=2 pgs=2225 cs=1
>> l=0 c=0x558186f61d80).fault with nothing to send, going to standby
>> 2016-07-08 20:03:53.724928 7fcd072c2700  0 -- 10.200.10.145:6818/6462 >>
>> 10.200.10.146:6800/4336 pipe(0x55818a25b400 sd=109 :6818 s=2 pgs=2440 cs=13
>> l=0 c=0x55818730f080).fault with nothing to send, going to standby
>> 2016-07-08 20:03:53.738216 7fcd0b7d3700  0 -- 10.200.10.145:6818/6462 >>
>> 10.200.10.144:6814/5069 pipe(0x55816c6a4800 sd=334 :53850 s=2 pgs=43 cs=1
>> l=0 c=0x55818611f800).fault with nothing to send, going to standby
>> 2016-07-08 20:03:53.739698 7fcd16f9f700  0 -- 10.200.10.145:6818/6462 >>
>> 10.200.10.144:6810/4786 pipe(0x558189e9b400 sd=55 :54650 s=2 pgs=1 cs=1 l=0
>> c=0x558184f7b000).fault with nothing to send, going to standby
>> .
>> .
>> (after all, the OSD killed himself)
>> 2016-07-08 20:04:13.254472 7fcd52c50700  1 heartbeat_map is_healthy
>> 'FileStore::op_tp thread 0x7fcd48716700' had timed out after 60
>> 2016-07-08 20:04:13.254474 7fcd52c50700  1 heartbeat_map is_healthy
>> 'FileStore::op_tp thread 0x7fcd47f15700' had timed out after 60
>> 2016-07-08 20:04:18.254546 7fcd52c50700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7fcd2927e700' had timed out after 15
>> 2016-07-08 20:04:18.254560 7fcd52c50700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7fcd2927e700' had suicide timed out after 150
>> 2016-07-08 20:04:18.401770 7fcd52c50700 -1 common/HeartbeatMap.cc: In
>> function 'bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*,
>> const char*, time_t)' thread 7fcd52c50700 time 2016-07-08 20:04:18.254578
>> common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide timeout")
>
> An OSD operation thread failed to complete an operation within the grace
> period and suicided. This is normally an indication of a problem at the system
> level as a system call has taken too long to complete (for example a "write").
>
>>
>>  ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x80) [0x55815ee03d60]
>>  2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char
>> const*, long)+0x259) [0x55815ed3c199]
>>  3: (ceph::HeartbeatMap::is_healthy()+0xe6) [0x55815ed3cac6]
>>  4: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0x55815ed3d31c]
>>  5: (CephContextServiceThread::entry()+0x167) [0x55815ee1c5c7]
>>  6: (()+0x76fa) [0x7fcd5775f6fa]
>>  7: (clone()+0x6d) [0x7fcd557dab5d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
>> to interpret this.
>>
>> The dump logs are almost same.
>>
>> ==================================================================
>>
>> 2. Server syslog
>>
>> (from 20:02 hearbeat_check message)
>> Jul  8 20:02:20 localhost ceph-osd[2705]: 2016-07-08 20:02:20.423548
>> 7fbab0b0b700 -1 osd.14 30969 heartbeat_check: no reply from osd.15 since
>> back 2016-07-08 20:02:00.011707 front 2016-07-08 20:02:00.011707 (cutoff
>> 2016-07-08 20:02:00.423542)
>> Jul  8 20:02:21 localhost ceph-osd[2705]: 2016-07-08 20:02:21.423660
>> 7fbab0b0b700 -1 osd.14 30969 heartbeat_check: no reply from osd.15 since
>> back 2016-07-08 20:02:00.011707 front 2016-07-08 20:02:00.011707 (cutoff
>> 2016-07-08 20:02:01.423655)
>> Jul  8 20:02:21 localhost ceph-osd[4471]: 2016-07-08
>> .
>> .
>> Jul  8 20:03:44 localhost kernel: [24000.909905] INFO: task
>> wb_throttle:6695 blocked for more than 120 seconds.
>> Jul  8 20:03:44 localhost kernel: [24000.965687]       Not tainted
>> 4.4.0-28-generic #47-Ubuntu
>> Jul  8 20:03:44 localhost kernel: [24001.022418] "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Jul  8 20:03:44 localhost kernel: [24001.080286] wb_throttle     D
>> ffff88038c6af818     0  6695      1 0x00000000
>> Jul  8 20:03:44 localhost kernel: [24001.080292]  ffff88038c6af818
>> 0000000000000000 ffff88081d598000 ffff8807aa4dc4c0
>> Jul  8 20:03:44 localhost kernel: [24001.080294]  ffff88038c6b0000
>> ffff8806e2f7eea0 ffff8806e2f7eeb8 ffff88038c6af858
>> Jul  8 20:03:44 localhost kernel: [24001.080296]  ffff8806e2f7ee98
>> ffff88038c6af830 ffffffff818235b5 ffff8806e2f7ee38
>> Jul  8 20:03:44 localhost kernel: [24001.080298] Call Trace:
>> Jul  8 20:03:44 localhost kernel: [24001.080298] Call Trace:
>> Jul  8 20:03:44 localhost kernel: [24001.080310]  [<ffffffff818235b5>]
>> schedule+0x35/0x80
>> Jul  8 20:03:44 localhost kernel: [24001.080348]  [<ffffffffc034c9c9>]
>> btrfs_tree_lock+0x79/0x210 [btrfs]
>> Jul  8 20:03:44 localhost kernel: [24001.080353]  [<ffffffff810c3ca0>] ?
>> wake_atomic_t_function+0x60/0x60
>> Jul  8 20:03:44 localhost kernel: [24001.080364]  [<ffffffffc02e9c43>]
>> push_leaf_left+0xd3/0x1b0 [btrfs]
>> Jul  8 20:03:44 localhost kernel: [24001.080373]  [<ffffffffc02ea33b>]
>> split_leaf+0x4fb/0x710 [btrfs]
>> Jul  8 20:03:44 localhost kernel: [24001.080383]  [<ffffffffc02eaef0>]
>> btrfs_search_slot+0x9a0/0x9f0 [btrfs]
>> Jul  8 20:03:44 localhost kernel: [24001.080393]  [<ffffffffc02ec851>]
>> btrfs_insert_empty_items+0x71/0xc0 [btrfs]
>> Jul  8 20:03:44 localhost kernel: [24001.080406]  [<ffffffffc038be9f>]
>> copy_items+0x139/0x851 [btrfs]
>> Jul  8 20:03:44 localhost kernel: [24001.080416]  [<ffffffffc02edbe9>] ?
>> btrfs_search_forward+0x2d9/0x350 [btrfs]
>> Jul  8 20:03:44 localhost kernel: [24001.080428]  [<ffffffffc038d4cb>]
>> btrfs_log_inode+0x69e/0xe45 [btrfs]
>> Jul  8 20:03:44 localhost kernel: [24001.080441]  [<ffffffffc0351d37>]
>> btrfs_log_inode_parent+0x2a7/0xee0 [btrfs]
>> Jul  8 20:03:44 localhost kernel: [24001.080454]  [<ffffffffc0334247>] ?
>> extent_writepages+0x67/0x90 [btrfs]
>> Jul  8 20:03:44 localhost kernel: [24001.080467]  [<ffffffffc03156b0>] ?
>> btrfs_real_readdir+0x5e0/0x5e0 [btrfs]
>> Jul  8 20:03:44 localhost kernel: [24001.080479]  [<ffffffffc030ff11>] ?
>> wait_current_trans.isra.21+0x31/0x120 [btrfs]
>> Jul  8 20:03:44 localhost kernel: [24001.080490]  [<ffffffffc030f987>] ?
>> join_transaction.isra.15+0x27/0x400 [btrfs]
>> Jul  8 20:03:44 localhost kernel: [24001.080503]  [<ffffffffc0353952>]
>> btrfs_log_dentry_safe+0x62/0x80 [btrfs]
>> Jul  8 20:03:44 localhost kernel: [24001.080515]  [<ffffffffc032774f>]
>> btrfs_sync_file+0x28f/0x3b0 [btrfs]
>> Jul  8 20:03:44 localhost kernel: [24001.080520]  [<ffffffff8124108b>]
>> vfs_fsync_range+0x4b/0xb0
>> Jul  8 20:03:44 localhost kernel: [24001.080522]  [<ffffffff8124114d>]
>> do_fsync+0x3d/0x70
>> Jul  8 20:03:44 localhost kernel: [24001.080524]  [<ffffffff81241403>]
>> SyS_fdatasync+0x13/0x20
>> Jul  8 20:03:44 localhost kernel: [24001.080527]  [<ffffffff818276b2>]
>> entry_SYSCALL_64_fastpath+0x16/0x71
>
> I don't se how Ceph could be calling this as it appears to be a btrfs thread

Sorry, I meant, "I don't see how Ceph could be causing this..."

> within the kernel waiting for a lock (speculation based on function name).
> This is almost definitely at the system level. Someone would need to debug why
> the btrfs file system code is hanging at that point.
>
> HTH,
> Brad
>
>> .
>> .
>> Jul  8 20:04:18 localhost ceph-osd[6462]: common/HeartbeatMap.cc: In
>> function 'bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*,
>> const char*, time_t)' thread 7fcd52c50700 time 2016-07-08 20:04:18.254578
>> Jul  8 20:04:18 localhost ceph-osd[6462]: common/HeartbeatMap.cc: 86:
>> FAILED assert(0 == "hit suicide timeout")
>> Jul  8 20:04:18 localhost ceph-osd[6462]:  ceph version 10.2.1
>> (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
>> Jul  8 20:04:18 localhost ceph-osd[6462]:  1:
>> (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80)
>> [0x55815ee03d60]
>> Jul  8 20:04:18 localhost ceph-osd[6462]:  2:
>> (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*,
>> long)+0x259) [0x55815ed3c199]
>> Jul  8 20:04:18 localhost ceph-osd[6462]:  3:
>> (ceph::HeartbeatMap::is_healthy()+0xe6) [0x55815ed3cac6]
>> Jul  8 20:04:18 localhost ceph-osd[6462]:  4:
>> (ceph::HeartbeatMap::check_touch_file()+0x2c) [0x55815ed3d31c]
>> Jul  8 20:04:18 localhost ceph-osd[6462]:  5:
>> (CephContextServiceThread::entry()+0x167) [0x55815ee1c5c7]
>> Jul  8 20:04:18 localhost ceph-osd[6462]:  6: (()+0x76fa) [0x7fcd5775f6fa]
>> Jul  8 20:04:18 localhost ceph-osd[6462]:  7: (clone()+0x6d)
>> [0x7fcd557dab5d]
>> Jul  8 20:04:18 localhost ceph-osd[6462]:  NOTE: a copy of the executable,
>> or `objdump -rdS <executable>` is needed to interpret this.
>> Jul  8 20:04:18 localhost ceph-osd[6462]: 2016-07-08 20:04:18.401770
>> 7fcd52c50700 -1 common/HeartbeatMap.cc: In function 'bool
>> ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*, const char*,
>> time_t)' thread 7fcd52c50700 time 2016-07-08 20:04:18.254578
>>
>> ==================================================================
>>
>> 3. There are no maximum stress of disk I/O
>> [image: 본문 이미지 1]
>>
>> ===================================================================
>>
>> 3. There are also cpu and memory stress
>>
>> 4. The average wirte I/O is about 250MB/s
>>  - We use ceph as root disk of OpenStack VMs
>>  - We are running test tools for 50 VMs which write tests with vdbench
>>
>> ====================================================================
>>
>> Could you anybody explain this issue and the exact reason of OSD suicide?
>>
>> Thanks.
>>
>> John Haan.
>
>
>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com