PG's stuck unclean active+remapped

Roel de Rooy <RdeRooy@xxxxxxxx> · Thu, 19 Oct 2017 11:34:09 +0000

Hi all,

I’m hoping some of you have some experience in dealing with this, as unfortunately this is the first time we encountered this issue.
We currently have placement groups that are stuck unclean with ‘active+remapped’ as last state.

The rundown of what happened:

Yesterday morning, one of our network engineers, was working on some LACP bonds on the same switch stack which also houses this cluster’s internal and public ceph networks.
Unfortunately the engineer also accidentally touched the LACP bonds of all 3 monitor servers and issues started to appear.

In a rapid amount, we started losing osd’s, one by one, and rebalance/recover started kicking in.
As connectivity between de monitor servers appeared ok (ping connectivity was somehow still there, there was still a quorum visible and ceph commands worked on all three), we didn’t suspect the monitor servers at first.

When investigating the osd’s that were marked down, the logging of those osd’s were full with below error messages:

                    - monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early
                    - auth: could not find secret_id
                    - cephx: verify_authorizer could not get service secret for service osd secret_id
                    - x.x.x.x:6801/1258067 >> x.x.x.x:0/1115346558 pipe(0x560136fd8800 sd=706 :6801 s=0 pgs=0 cs=0 l=1 c=0x560122778e80).accept: got bad authorizer

We suspected time sync, but everything turned out ok.
As more and more osd’s started failing, we changed the crushmap to add 2 additional osd nodes for the affected pools, that were not housing any data at the moment, but the same message kept appearing on these osd’s as
 well.
In the meantime enough osd’s were down, so everything stopped in it’s process.

After finding out about the LACP bonds, the changes were reverted and all osd’s came up again.
Unfortunately after some time rebalance/recover stopped and status gives the following information:

    health HEALTH_WARN
            1088 pgs stuck unclean
            recovery 92/1073206 objects degraded (0.009%)
            recovery 53092/1073206 objects misplaced (4.947%)
            nodeep-scrub,sortbitwise,require_jewel_osds flag(s) set
     monmap e1: 3 mons at {srv-ams3-cmon-01=192.168.152.3:6789/0,srv-ams3-cmon-02=192.168.152.4:6789/0,srv-ams3-cmon-03=192.168.152.5:6789/0}
            election epoch 5152, quorum 0,1,2 srv-ams3-cmon-01,srv-ams3-cmon-02,srv-ams3-cmon-03
     osdmap e30517: 39 osds: 39 up, 39 in; 1088 remapped pgs
            flags nodeep-scrub,sortbitwise,require_jewel_osds
      pgmap v20285289: 2340 pgs, 22 pools, 2056 GB data, 524 kobjects
            4057 GB used, 12409 GB / 16466 GB avail
            92/1073206 objects degraded (0.009%)
            53092/1073206 objects misplaced (4.947%)
                1252 active+clean
                1088 active+remapped

There does not seem to be any issue that prevents continuous service of the connected clients, but when querying such a placement group, it’s show that:
-        
2 osd’s are acting (the pools have a replication size of 2 at the moment)
-        
1 osd is primary
-        
Both osd’s are visible as value for ‘actingbackfill’
-        
up_primary has the value ‘-1’
-        
None is up

We already tried reweighting the affected primary osd, but the affected placement groups are not touched by the rebalance.
Restarting the osd’s also did not have any affect.
We even tried ‘ceph osd crush tunables optimal’, but as we already though it would not have any affect.

Sorry for the long read, but if someone might have an idea what we could try?
I did read about setting ‘osd_find_best_info_ignore_history_les’ to true, but I’m not sure what the implications would be when using this setting.
Additionally we did set deep-scrub of during the recovery, could this be something deep-scrub would fix?

Thanks in advance!

Roel

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com