Re: IO stall after 1 slow op

Frank Schilder <frans@xxxxxx> · Mon, 14 Feb 2022 10:17:58 +0000

Have you found out a way to avoid that? Does it have to do with RBD discard settings for RBD clients?

Here is another example, this time an OSD even got marked down:

2022-02-11 20:00:00.000105 [INF]  overall HEALTH_OK
2022-02-11 19:59:33.579036 [INF]  Cluster is now healthy
2022-02-11 19:59:33.578999 [INF]  Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 1/9911260528 objects degraded (0.000%), 1 pg degraded)
2022-02-11 19:59:32.069502 [WRN]  Health check update: Degraded data redundancy: 1/9911260528 objects degraded (0.000%), 1 pg degraded (PG_DEGRADED)
2022-02-11 19:59:27.900043 [INF]  Health check cleared: SLOW_OPS (was: 894 slow ops, oldest one blocked for 39 sec, daemons [osd.279,osd.281,osd.549,osd.597,osd.598] have slow ops.)
2022-02-11 19:59:25.809557 [INF]  Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 1 pg peering)
2022-02-11 19:59:25.809522 [WRN]  Health check update: 894 slow ops, oldest one blocked for 39 sec, daemons [osd.279,osd.281,osd.549,osd.597,osd.598] have slow ops. (SLOW_OPS)
2022-02-11 19:59:25.809489 [WRN]  Health check update: Degraded data redundancy: 403785/9911253409 objects degraded (0.004%), 96 pgs degraded (PG_DEGRADED)
2022-02-11 19:59:20.277378 [WRN]  Monitor daemon marked osd.598 down, but it is still running
2022-02-11 19:59:21.698182 [INF]  osd.598 192.168.32.82:6809/4872 boot
2022-02-11 19:59:21.664476 [INF]  Health check cleared: OSD_DOWN (was: 1 osds down)
2022-02-11 19:59:19.234999 [WRN]  Health check update: 1039 slow ops, oldest one blocked for 37 sec, daemons [osd.240,osd.242,osd.244,osd.245,osd.246,osd.247,osd.249,osd.250,osd.252,osd.253]... have slow ops. (SLOW_OPS)
2022-02-11 19:59:19.234978 [WRN]  Health check failed: Degraded data redundancy: 120647/9911250435 objects degraded (0.001%), 10 pgs degraded (PG_DEGRADED)
2022-02-11 19:59:19.234945 [WRN]  Health check failed: Reduced data availability: 2 pgs peering (PG_AVAILABILITY)
2022-02-11 19:59:17.181317 [WRN]  Health check failed: 1 osds down (OSD_DOWN)
2022-02-11 19:59:16.286190 [INF]  osd.598 failed (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-18) (3 reporters from different host after 23.189467 >= grace 20.777782)
2022-02-11 19:59:13.917369 [WRN]  Health check failed: 5 slow ops, oldest one blocked for 31 sec, daemons [osd.573,osd.585,osd.606] have slow ops. (SLOW_OPS)
2022-02-11 19:26:08.195090 [INF]  Cluster is now healthy
2022-02-11 19:26:08.195050 [INF]  Health check cleared: SLOW_OPS (was: 15 slow ops, oldest one blocked for 47 sec, daemons [osd.278,osd.548,osd.598] have slow ops.)
2022-02-11 19:26:07.116281 [WRN]  Health check update: 15 slow ops, oldest one blocked for 47 sec, daemons [osd.278,osd.548,osd.598] have slow ops. (SLOW_OPS)
2022-02-11 19:26:02.115736 [WRN]  Health check update: 452 slow ops, oldest one blocked for 48 sec, daemons [osd.241,osd.242,osd.243,osd.246,osd.247,osd.248,osd.250,osd.251,osd.252,osd.253]... have slow ops. (SLOW_OPS)
2022-02-11 19:25:55.143277 [WRN]  Health check update: 394 slow ops, oldest one blocked for 44 sec, daemons [osd.241,osd.242,osd.243,osd.246,osd.247,osd.248,osd.250,osd.251,osd.252,osd.253]... have slow ops. (SLOW_OPS)
2022-02-11 19:25:48.802544 [WRN]  Health check update: 316 slow ops, oldest one blocked for 37 sec, daemons [osd.242,osd.243,osd.246,osd.247,osd.248,osd.250,osd.251,osd.252,osd.255,osd.257]... have slow ops. (SLOW_OPS)
2022-02-11 19:25:42.421675 [WRN]  Health check failed: 12 slow ops, oldest one blocked for 32 sec, daemons [osd.243,osd.252,osd.560,osd.561,osd.589,osd.600] have slow ops. (SLOW_OPS)
2022-02-11 19:08:34.916055 [INF]  Cluster is now healthy
2022-02-11 19:08:34.916018 [INF]  Health check cleared: SLOW_OPS (was: 1 slow ops, oldest one blocked for 30 sec, osd.287 has slow ops)
2022-02-11 19:08:30.686147 [WRN]  Health check failed: 1 slow ops, oldest one blocked for 30 sec, osd.287 has slow ops (SLOW_OPS)
2022-02-11 19:00:00.000157 [INF]  overall HEALTH_OK

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: 黄俊艺 <york@xxxxxxxxxxxxx>
Sent: 11 February 2022 14:19:15
To: Frank Schilder; ceph-users
Subject: Re: IO stall after 1 slow op

Hello Frank,

We've observed seemingly identical issue when a `fstrim` is carried out on one of the RBD-backed iSCSI multipath device (we use ceph-iscsi to map RBD image to local multipath device which is formatted in XFS filesystem). BTW, we use Nautilus 14.2.22.

------------------ Original ------------------
From:  "Frank Schilder"<frans@xxxxxx>;
Date:  Thu, Feb 10, 2022 09:19 PM
To:  "ceph-users"<ceph-users@xxxxxxx>;
Subject:   IO stall after 1 slow op

Dear cephers,

I'm observing something strange on our cluster. A couple of times a day I see a near-complete IO stall on out pools backing RBD devices used by libvirt/KVM. The scenario is always the same, 1 op is blocked and 30-40 seconds later an avalanche of blocked ops piles up while collective IO comes to a grinding halt for 60-90 seconds. After this time, the first slow op finally completes and things start moving again. I have an example of such a blocked OP below, I can't see anything special about it except its exceptionally high duration.

This phenomenon is randomly distributed over all drives and drives types (SSD,HDD) in the cluster, so I don't think it is a failing drive that causes this. I looked at the OSD logs and couldn't find anything unusual. There was no deep-scrub, compaction or other activity in the log. It just seems that the OSD worker thread gets stuck for no apparent reason.

Has anyone seen something like that and/or has an idea how to address it?

Thanks for any input! Here the blocked OP:

"description": "osd_op(client.49504448.0:25390069 2.6fs0 2:f6036592:::rbd_data.1.6ed7996b8b4567.0000000000000379:head [stat,write 1953792~81920] snapc 4f7ce=[4f7ce,4f4fc,4f225,4ef4c,4ec84,4e9b5,4e6d1,4e3fb,4e128,4db85,4c7cb,4b4a8,4a0b2] ondisk+write+known_if_redirected e732656)",
"initiated_at": "2022-02-10 10:51:19.328624",
"age": 1894.887633,
"duration": 88.017855,
"type_data": {
    "flag_point": "commit sent; apply or cleanup",
    "client_info": {
        "client": "client.49504448",
        "client_addr": "192.168.48.17:0/1008950287",
        "tid": 25390069
    },
    "events": [
        {
            "time": "2022-02-10 10:51:19.328624",
            "event": "initiated"
        },
        {
            "time": "2022-02-10 10:51:19.328624",
            "event": "header_read"
        },
        {
            "time": "2022-02-10 10:51:19.328625",
            "event": "throttled"
        },
        {
            "time": "2022-02-10 10:51:19.328704",
            "event": "all_read"
        },
        {
            "time": "2022-02-10 10:51:19.328719",
            "event": "dispatched"
        },
        {
            "time": "2022-02-10 10:51:19.328726",
            "event": "queued_for_pg"
        },
        {
            "time": "2022-02-10 10:51:19.329025",
            "event": "reached_pg"
        },
        {
            "time": "2022-02-10 10:51:19.329052",
            "event": "waiting for rw locks"
        },
        {
            "time": "2022-02-10 10:51:19.332192",
            "event": "reached_pg"
        },
        {
            "time": "2022-02-10 10:51:19.332199",
            "event": "waiting for rw locks"
        },
        {
            "time": "2022-02-10 10:51:19.521615",
            "event": "reached_pg"
        },
        {
            "time": "2022-02-10 10:51:19.521638",
            "event": "waiting for rw locks"
        },
        {
            "time": "2022-02-10 10:51:20.117640",
            "event": "reached_pg"
        },
        {
            "time": "2022-02-10 10:51:20.117679",
            "event": "started"
        },
        {
            "time": "2022-02-10 10:51:20.123190",
            "event": "sub_op_started"
        },
        {
            "time": "2022-02-10 10:51:20.124040",
            "event": "sub_op_committed"
        },
        {
            "time": "2022-02-10 10:52:47.346397",
            "event": "commit_sent"
        },
        {
            "time": "2022-02-10 10:52:47.346480",
            "event": "done"
        }
    ]

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx