Hi, On 10/12/2018 01:55 PM, Nils Fahldieck - Profihost AG wrote:
I rebooted a Ceph host and logged `ceph status` & `ceph health detail` every 5 seconds. During this I encountered 'PG_AVAILABILITY Reduced data availability: pgs peering'. At the same time some VMs hung as described before.
Just a wild guess... you have 71 OSDs and about 4500 PG with size=3. 13500 PG instance overall, resulting in ~190 PGs per OSD under normal circumstances.
If one host is down and the PGs have to re-peer, you might reach the limit of 200 PG/OSDs on some of the OSDs, resulting in stuck peering.
You can try to raise this limit. There are several threads on the mailing list about this.
Regards, Burkhard -- Dr. rer. nat. Burkhard Linke Bioinformatics and Systems Biology Justus-Liebig-University Giessen 35392 Giessen, Germany Phone: (+49) (0)641 9935810 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com