Hello, after having moved 4 ssds to another host (+ the ceph tell hanging issue - see previous mail), we ran into 241 unknown pgs: cluster: id: 1ccd84f6-e362-4c50-9ffe-59436745e445 health: HEALTH_WARN noscrub flag(s) set 2 nearfull osd(s) 1 pool(s) nearfull Reduced data availability: 241 pgs inactive 1532 slow requests are blocked > 32 sec 789 slow ops, oldest one blocked for 1949 sec, daemons [osd.12,osd.14,osd.2,osd.20,osd.23,osd.25,osd.3,osd.33,osd.35,osd.50]... have slow ops. services: mon: 3 daemons, quorum black1,black2,black3 (age 97m) mgr: black2(active, since 96m), standbys: black1, black3 osd: 85 osds: 85 up, 82 in; 118 remapped pgs flags noscrub rgw: 1 daemon active (admin) data: pools: 12 pools, 3000 pgs objects: 33.96M objects, 129 TiB usage: 388 TiB used, 159 TiB / 548 TiB avail pgs: 8.033% pgs unknown 409151/101874117 objects misplaced (0.402%) 2634 active+clean 241 unknown 107 active+remapped+backfill_wait 11 active+remapped+backfilling 7 active+clean+scrubbing+deep io: client: 91 MiB/s rd, 28 MiB/s wr, 1.76k op/s rd, 686 op/s wr recovery: 67 MiB/s, 17 objects/s This used to be around 700+ unknown, however these 241 are stuck in this state for more than 1h. Below is a sample of pgs from "ceph pg dump all | grep unknown" 2.7f7 0 0 0 0 0 0 0 0 0 0 unknown 2020-09-22 19:03:00.694873 0'0 0:0 [] -1 [] -1 0'0 2020-09-22 19:03:00.694873 0'0 2020-09-22 19:03:00.694873 0 2.7c7 0 0 0 0 0 0 0 0 0 0 unknown 2020-09-22 19:03:00.694873 0'0 0:0 [] -1 [] -1 0'0 2020-09-22 19:03:00.694873 0'0 2020-09-22 19:03:00.694873 0 2.7c2 0 0 0 0 0 0 0 0 0 0 unknown 2020-09-22 19:03:00.694873 0'0 0:0 [] -1 [] -1 0'0 2020-09-22 19:03:00.694873 0'0 2020-09-22 19:03:00.694873 0 2.7ab 0 0 0 0 0 0 0 0 0 0 unknown 2020-09-22 19:03:00.694873 0'0 0:0 [] -1 [] -1 0'0 2020-09-22 19:03:00.694873 0'0 2020-09-22 19:03:00.694873 0 2.78b 0 0 0 0 0 0 0 0 0 0 unknown 2020-09-22 19:03:00.694873 0'0 0:0 [] -1 [] -1 0'0 2020-09-22 19:03:00.694873 0'0 2020-09-22 19:03:00.694873 0 2.788 0 0 0 0 0 0 0 0 0 0 unknown 2020-09-22 19:03:00.694873 0'0 0:0 [] -1 [] -1 0'0 2020-09-22 19:03:00.694873 0'0 2020-09-22 19:03:00.694873 0 2.76e 0 Using ceph pg 2.7f7 query hangs. We checked and one server did have an incorrect MTU setting (9204 instead of the correct 9000), but that was fixed some hours ago. Does anyone have a hint on how to find those unknown osds? Version wise this is 14.2.9: [20:42:20] black2.place6:~# ceph versions { "mon": { "ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)": 3 }, "mgr": { "ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)": 3 }, "osd": { "ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)": 85 }, "mds": {}, "rgw": { "ceph version 20200428-923-g4004f081ec (4004f081ec047d60e84d76c2dad6f31e2ac44484) nautilus (stable)": 1 }, "overall": { "ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)": 91, "ceph version 20200428-923-g4004f081ec (4004f081ec047d60e84d76c2dad6f31e2ac44484) nautilus (stable)": 1 } } >From ceph health detail: [20:42:58] black2.place6:~# ceph health detail HEALTH_WARN noscrub flag(s) set; 2 nearfull osd(s); 1 pool(s) nearfull; Reduced data availability: 241 pgs inactive; 1575 slow requests are blocked > 32 sec; 751 slow ops, oldest one blocked for 1986 sec, daemons [osd.12,osd.14,osd.2,osd.20,osd.23,osd.25,osd.3,osd.31,osd.33,osd.35]... have slow ops. OSDMAP_FLAGS noscrub flag(s) set OSD_NEARFULL 2 nearfull osd(s) osd.36 is near full osd.54 is near full POOL_NEARFULL 1 pool(s) nearfull pool 'ssd' is nearfull PG_AVAILABILITY Reduced data availability: 241 pgs inactive pg 2.82 is stuck inactive for 6027.042489, current state unknown, last acting [] pg 2.88 is stuck inactive for 6027.042489, current state unknown, last acting [] ... pg 19.6e is stuck inactive for 6027.042489, current state unknown, last acting [] pg 20.69 is stuck inactive for 6027.042489, current state unknown, last acting [] As can be seen, multiple pools are affected even though most missing pgs are from pool 2. Best regards, Nico -- Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx