Dear all, We're running Ceph Luminous and we've recently hit an issue with some OSD's (autoout states, IO/CPU overload) which unfortunately resulted with one placement group with the state "stale+active+clean", it's a placement group from .rgw.root pool: 1.15 0 0 0 0 0 0 1 1 stale+active+clean 2020-05-11 23:22:51.396288 40'1 2142:152 [3,2,6] 3 [3,2,6] 3 40'1 2020-04-22 00:46:05.904418 40'1 2020-04-20 20:18:13.371396 0 I guess there is no active replica of that object anywhere on the cluster. Restarting osd.3, osd.2 or osd.6 daemons does not help. I've used ceph-objectstore-tool and successfully exported placement group from osd.3, osd.2 and osd.6 and tried to import it on a completely different OSD, the exports differ in filesize slightly, but the osd.3 wihch was the latest primary is the biggest so I've tried to import it on a different OSD, when starting up I see the following (this is from osd.1): 2020-05-14 21:43:19.779740 7f7880ac3700 1 osd.1 pg_epoch: 2459 pg[1.15( v 40'1 (0'0,40'1] local-lis/les=2073/2074 n=0 ec=73/39 lis/c 2073/2073 les/c/f 2074/2074/633 2145/39/2145) [] r=-1 lpr=2455 crt=40'1 lcod 0'0 unknown NOTIFY] state<Start>: transitioning to Stray I see from previous pg dumps (several weeks before while it was still active+clean) that it was 115 bytes with zero objects in it but I am not sure how to interpret that. As this is a pg from .rgw.root pool, I cannot get any response from the cluster when accessing (everything timeouts). What is the correct course of action with this pg? Any help would be greatly appriciated. Thanks, Tomislav _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx