Re: Ceph PGs stuck inactive after rebuild node

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the comments, I'll get the log files to see if there's any hint. Getting the PGs in an active state is one thing, I'm sure multiple approaches would have worked. The main question is why this happens, we have 19 hosts to rebuild and can't risk the application outage everytime.

Was the PG stuck in the "activating" state? If so, I wonder if you temporarily exceeded mon_max_pg_per_osd on some OSDs when rebuilding your host. At least on Nautilus I've seen cases where Ceph doesn't gracefully recover from this temporary limit violation and the PGs need some nudges to become active.

I'm pretty sure that their cluster isn't anywhere near the limit for mon_max_pg_per_osd, they currently have around 100 PGs per OSD and the configs have not been touched, it's pretty basic. This cluster was upgraded from Luminous to Nautilus a few months ago.

Zitat von Anthony D'Atri <anthony.datri@xxxxxxxxx>:

Something worth a try before restarting an OSD in situations like this:

	ceph osd down 9

This marks the OSD down in the osdmap, but doesn’t touch the daemon.

Typically the subject OSD will see this and tell the mons “I’m not dead yet!” and repeer, which sometimes suffices to clear glitches.



Then I restarted OSD.9 and the inactive PG became active again.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux