Re: Troubleshooting hanging storage backend whenever there is any cluster change

Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> · Fri, 12 Oct 2018 14:03:06 +0200

Hi,

On 10/12/2018 01:55 PM, Nils Fahldieck - Profihost AG wrote:
I rebooted a Ceph host and logged `ceph status` & `ceph health detail`
every 5 seconds. During this I encountered 'PG_AVAILABILITY Reduced data
availability: pgs peering'. At the same time some VMs hung as described
before.

Just a wild guess... you have 71 OSDs and about 4500 PG with size=3. 
13500 PG instance overall, resulting in ~190 PGs per OSD under normal 
circumstances.

If one host is down and the PGs have to re-peer, you might reach the 
limit of 200 PG/OSDs on some of the OSDs, resulting in stuck peering.

You can try to raise this limit. There are several threads on the 
mailing list about this.

Regards,
Burkhard

--
Dr. rer. nat. Burkhard Linke
Bioinformatics and Systems Biology
Justus-Liebig-University Giessen
35392 Giessen, Germany
Phone: (+49) (0)641 9935810

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com