Re: osd regularly wrongly marked down

Francois Legrand <fleg@xxxxxxxxxxxxxx> · Mon, 31 Aug 2020 11:44:04 +0200

We tried to rise the osd_memory_target from 4 to 8G but the problem 
still occurs (osd wrongly marked down few times a day).
Does somebody have any clue ?
F.

    On Fri, Aug 28, 2020 at 10:34 AM Francois Legrand
    <fleg@xxxxxxxxxxxxxx <mailto:fleg@xxxxxxxxxxxxxx>> wrote:

        Hi all,

        We have a ceph cluster in production with 6 osds servers (with
        16x8TB
        disks), 3 mons/mgrs and 3 mdss. Both public and cluster
        networks are in
        10GB and works well.

        After a major crash in april, we turned the option
        bluefs_buffered_io to
        false  to workaround the large write bug when
        bluefs_buffered_io was
        true (we were in version 14.2.8 and the default value at this
        time was
        true).
        Since that time, we regularly have some osds wrongly marked
        down by the
        cluster after heartbeat timeout (heartbeat_map is_healthy
        'OSD::osd_op_tp thread 0x7f03f1384700' had timed out after 15).

        Generally the osd restart and the cluster is back healthy, but
        several
        time, after many of these kick-off the osd reach the
        osd_op_thread_suicide_timeout and goes down definitely.

        We increased the osd_op_thread_timeout and
        osd_op_thread_suicide_timeout... The problems still occurs
        (but less
        frequently).

        Few days ago, we upgraded to 14.2.11 and revert the timeout to
        their
        default value, hoping that it will solve the problem (we
        thought that it
        should be related to this bug
        https://tracker.ceph.com/issues/45943),
        but it didn't. We still have some osds wrongly marked down.

        Can somebody help us to fix this problem ?
        Thanks.

        Here is an extract of an osd log at failure time:

        ---------------------------------
        2020-08-28 02:19:05.019 7f03f1384700  0 log_channel(cluster)
        log [DBG] :
        44.7d scrub starts
        2020-08-28 02:19:25.755 7f040e43d700  1 heartbeat_map is_healthy
        'OSD::osd_op_tp thread 0x7f03f1384700' had timed out after 15
        2020-08-28 02:19:25.755 7f040dc3c700  1 heartbeat_map is_healthy
        'OSD::osd_op_tp thread 0x7f03f1384700' had timed out after 15
        this last line is repeated more than 1000 times
        ...
        2020-08-28 02:20:17.484 7f040d43b700  1 heartbeat_map is_healthy
        'OSD::osd_op_tp thread 0x7f03f1384700' had timed out after 15
        2020-08-28 02:20:17.551 7f03f1384700  0
        bluestore(/var/lib/ceph/osd/ceph-16) log_latency_fn slow
        operation
        observed for _collection_list, latency = 67.3532s, lat = 67s cid
        =44.7d_head start GHMAX end GHMAX max 25
        ...
        2020-08-28 02:20:22.600 7f040dc3c700  1 heartbeat_map is_healthy
        'OSD::osd_op_tp thread 0x7f03f1384700' had timed out after 15
        2020-08-28 02:21:20.774 7f03f1384700  0
        bluestore(/var/lib/ceph/osd/ceph-16) log_latency_fn slow
        operation
        observed for _collection_list, latency = 63.223s, lat = 63s cid
        =44.7d_head start
        #44:beffc78d:::rbd_data.1e48e8ab988992.00000000000011bd:0# end
        #MAX# max
        2147483647
        2020-08-28 02:21:20.774 7f03f1384700  1 heartbeat_map
        reset_timeout
        'OSD::osd_op_tp thread 0x7f03f1384700' had timed out after 15
        2020-08-28 02:21:20.805 7f03f1384700  0 log_channel(cluster)
        log [DBG] :
        44.7d scrub ok
        2020-08-28 02:21:21.099 7f03fd997700  0 log_channel(cluster)
        log [WRN] :
        Monitor daemon marked osd.16 down, but it is still running
        2020-08-28 02:21:21.099 7f03fd997700  0 log_channel(cluster)
        log [DBG] :
        map e609411 wrongly marked me down at e609410
        2020-08-28 02:21:21.099 7f03fd997700  1 osd.16 609411
        start_waiting_for_healthy
        2020-08-28 02:21:21.119 7f03fd997700  1 osd.16 609411 start_boot
        2020-08-28 02:21:21.124 7f03f0b83700  1 osd.16 pg_epoch: 609410
        pg[36.3d0( v 609409'481293 (449368'478292,609409'481293]
        local-lis/les=609403/609404 n=154651 ec=435353/435353 lis/c
        609403/609403 les/c/f 609404/609404/0 609410/609410/608752)
        [25,72] r=-1
        lpr=609410 pi=[609403,609410)/1 luod=0'0 lua=609392'481198
        crt=609409'481293 lcod 609409'481292 active mbc={}]
        start_peering_interval up [25,72,16] -> [25,72], acting
        [25,72,16] ->
        [25,72], acting_primary 25 -> 25, up_primary 25 -> 25, role 2
        -> -1,
        features acting 4611087854031667199 upacting 4611087854031667199
        ...
        2020-08-28 02:21:21.166 7f03f0b83700  1 osd.16 pg_epoch: 609411
        pg[36.56( v 609409'480511 (449368'477424,609409'480511]
        local-lis/les=609403/609404 n=153854 ec=435353/435353 lis/c
        609403/609403 les/c/f 609404/609404/0 609410/609410/609410)
        [103,102]
        r=-1 lpr=609410 pi=[609403,609410)/1 crt=609409'480511 lcod
        609409'480510 unknown NOTIFY mbc={}] state<Start>:
        transitioning to Stray
        2020-08-28 02:21:21.307 7f04073b0700  1 osd.16 609413
        set_numa_affinity
        public network em1 numa node 0
        2020-08-28 02:21:21.307 7f04073b0700  1 osd.16 609413
        set_numa_affinity
        cluster network em2 numa node 0
        2020-08-28 02:21:21.307 7f04073b0700  1 osd.16 609413
        set_numa_affinity
        objectstore and network numa nodes do not match
        2020-08-28 02:21:21.307 7f04073b0700  1 osd.16 609413
        set_numa_affinity
        not setting numa affinity
        2020-08-28 02:21:21.566 7f040a435700  1 osd.16 609413 tick
        checking mon
        for new map
        2020-08-28 02:21:22.515 7f03fd997700  1 osd.16 609414 state:
        booting ->
        active
        2020-08-28 02:21:22.515 7f03f0382700  1 osd.16 pg_epoch: 609414
        pg[36.20( v 609409'483167 (449368'480117,609409'483167]
        local-lis/les=609403/609404 n=155171 ec=435353/435353 lis/c
        609403/609403 les/c/f 609404/609404/0 609414/609414/609361)
        [97,16,72]
        r=1 lpr=609414 pi=[609403,609414)/1 crt=609409'483167 lcod
        609409'483166
        unknown NOTIFY mbc={}] start_peering_interval up [97,72] ->
        [97,16,72],
        acting [97,72] -> [97,16,72], acting_primary 97 -> 97,
        up_primary 97 ->
        97, role -1 -> 1, features acting 4611087854031667199 upacting
        4611087854031667199
        ...
        2020-08-28 02:21:22.522 7f03f1384700  1 osd.16 pg_epoch: 609414
        pg[36.2f3( v 609409'479796 (449368'476712,609409'479796]
        local-lis/les=609403/609404 n=154451 ec=435353/435353 lis/c
        609403/609403 les/c/f 609404/609404/0 609414/609414/609414)
        [16,34,21]
        r=0 lpr=609414 pi=[609403,609414)/1 crt=609409'479796 lcod
        609409'479795
        mlcod 0'0 unknown NOTIFY mbc={}] start_peering_interval up
        [34,21] ->
        [16,34,21], acting [34,21] -> [16,34,21], acting_primary 34 ->
        16,
        up_primary 34 -> 16, role -1 -> 0, features acting
        4611087854031667199
        upacting 4611087854031667199
        2020-08-28 02:21:22.522 7f03f1384700  1 osd.16 pg_epoch: 609414
        pg[36.2f3( v 609409'479796 (449368'476712,609409'479796]
        local-lis/les=609403/609404 n=154451 ec=435353/435353 lis/c
        609403/609403 les/c/f 609404/609404/0 609414/609414/609414)
        [16,34,21]
        r=0 lpr=609414 pi=[609403,609414)/1 crt=609409'479796 lcod
        609409'479795
        mlcod 0'0 unknown mbc={}] state<Start>: transitioning to Primary
        2020-08-28 02:21:24.738 7f03f1384700  0 log_channel(cluster)
        log [DBG] :
        36.2f3 scrub starts
        2020-08-28 02:22:18.857 7f03f1384700  0 log_channel(cluster)
        log [DBG] :
        36.2f3 scrub ok
        _______________________________________________
        ceph-users mailing list -- ceph-users@xxxxxxx
        <mailto:ceph-users@xxxxxxx>
        To unsubscribe send an email to ceph-users-leave@xxxxxxx
        <mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx