We had an OSD host with 13 OSDs fail today and we have a weird blocked OP message that I can't understand. There are no OSDs with blocked ops, just `mon` (multiple times), and some of the rgw instances. cluster: id: 570bcdbb-9fdf-406f-9079-b0181025f8d0 health: HEALTH_WARN 1 large omap objects Degraded data redundancy: 2083023/195702437 objects degraded (1.064%), 880 pgs degraded, 880 pgs undersized 1609 pgs not deep-scrubbed in time 4 slow ops, oldest one blocked for 506699 sec, daemons [mon,sun-gcs02-rgw01,mon,sun-gcs02-rgw02,mon,sun-gcs02-rgw03] have slow ops. services: mon: 3 daemons, quorum sun-gcs02-rgw01,sun-gcs02-rgw02,sun-gcs02-rgw03 (age 6m) mgr: sun-gcs02-rgw02(active, since 5d), standbys: sun-gcs02-rgw03, sun-gcs02-rgw04 osd: 767 osds: 754 up (since 10m), 754 in (since 104m); 880 remapped pgs rgw: 16 daemons active (sun-gcs02-rgw01.rgw0, sun-gcs02-rgw01.rgw1, sun-gcs02-rgw01.rgw2, sun-gcs02-rgw01.rgw3, sun-gcs02-rgw02.rgw0, sun-gcs02-rgw02.rgw1, sun-gcs02-rgw02.rgw2, sun-gcs02-rgw02.rgw3, sun-gcs02-rgw03.rgw0, sun-gcs02-rgw03.rgw1, sun-gcs02-rgw03.rgw2, s un-gcs02-rgw03.rgw3, sun-gcs02-rgw04.rgw0, sun-gcs02-rgw04.rgw1, sun-gcs02-rgw04.rgw2, sun-gcs02-rgw04.rgw3) data: pools: 7 pools, 8240 pgs objects: 19.57M objects, 52 TiB usage: 88 TiB used, 6.1 PiB / 6.2 PiB avail pgs: 2083023/195702437 objects degraded (1.064%) 43492/195702437 objects misplaced (0.022%) 7360 active+clean 868 active+undersized+degraded+remapped+backfill_wait 12 active+undersized+degraded+remapped+backfilling io: client: 150 MiB/s rd, 642 op/s rd, 0 op/s wr recovery: 626 MiB/s, 223 objects/s $ ceph versions { "mon": { "ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)": 3 }, "mgr": { "ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)": 3 }, "osd": { "ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 754 }, "mds": {}, "rgw": { "ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)": 16 }, "overall": { "ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 754, "ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)": 22 } } I restarted one of the monitors and it dropped out of the list only showing 2 blocked ops, but then showed up again a little while later. Any ideas on where to look? Thanks, Robert LeBlanc ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx