Hi Kevin,
Unfortunately restarting OSD don't appear to help, instead it seems to make it worse with PGs getting stuck degraded.
Best regards
/Magnus
2018-07-11 20:46 GMT+02:00 Kevin Olbrich <ko@xxxxxxx>:
Sounds a little bit like the problem I had on OSDs:Blocked requests activating+remapped after extending pg(p)_num Kevin Olbrich
- Blocked requests activating+remapped afterextendingpg(p)_num Burk
hard Linke
- Blocked requests activating+remapped afterextendingpg(p)_num Kevi
n Olbrich
- Blocked requests activating+remapped afterextendingpg(p)_num Kevi
n Olbrich - Blocked requests activating+remapped afterextendingpg(p)_num Kevi
n Olbrich - Blocked requests activating+remapped afterextendingpg(p)_num Kevi
n Olbrich - Blocked requests activating+remapped afterextendingpg(p)_num Paul Emmerich
- Blocked requests activating+remapped afterextendingpg(p)_num Kevi
n Olbrich I ended up restarting the OSDs which were stuck in that state and they immediately fixed themselfs.It should also work to just "out" the problem-OSDs and immeditly up them again to fix it.
- Kevin2018-07-11 20:30 GMT+02:00 Magnus Grönlund <magnus@xxxxxxxxxxx>:______________________________Hi,Started to upgrade a ceph-cluster from Jewel (10.2.10) to Luminous (12.2.6)After upgrading and restarting the mons everything looked OK, the mons had quorum, all OSDs where up and in and all the PGs where active+clean.But before I had time to start upgrading the OSDs it became obvious that something had gone terribly wrong.All of a sudden 1600 out of 4100 PGs where inactive and 40% of the data was misplaced!The mons appears OK and all OSDs are still up and in, but a few hours later there was still 1483 pgs stuck inactive, essentially all of them in peering!Investigating one of the stuck PGs it appears to be looping between “inactive”, “remapped+peering” and “peering” and the epoch number is rising fast, see the attached pg query outputs.We really can’t afford to loose the cluster or the data so any help or suggestions on how to debug or fix this issue would be very, very appreciated!health: HEALTH_ERR1483 pgs are stuck inactive for more than 60 seconds542 pgs backfill_wait14 pgs backfilling11 pgs degraded1402 pgs peering3 pgs recovery_wait11 pgs stuck degraded1483 pgs stuck inactive2042 pgs stuck unclean7 pgs stuck undersized7 pgs undersized111 requests are blocked > 32 sec10586 requests are blocked > 4096 secrecovery 9472/11120724 objects degraded (0.085%)recovery 1181567/11120724 objects misplaced (10.625%)noout flag(s) setmon.eselde02u32 low disk spaceservices:mon: 3 daemons, quorum eselde02u32,eselde02u33,eselde02u34 mgr: eselde02u32(active), standbys: eselde02u33, eselde02u34osd: 111 osds: 111 up, 111 in; 800 remapped pgsflags nooutdata:pools: 18 pools, 4104 pgsobjects: 3620k objects, 13875 GBusage: 42254 GB used, 160 TB / 201 TB availpgs: 1.876% pgs unknown34.259% pgs not active9472/11120724 objects degraded (0.085%)1181567/11120724 objects misplaced (10.625%)2062 active+clean1221 peering535 active+remapped+backfill_wait181 remapped+peering77 unknown13 active+remapped+backfilling7 active+undersized+degraded+remapped+backfill_wait 4 remapped3 active+recovery_wait+degraded+remapped 1 active+degraded+remapped+backfilling io:recovery: 298 MB/s, 77 objects/s_________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com