Hi all, we are using ceph in version 14.2.2 from https://mirror.croit.io/debian-nautilus/ on debian buster and experiencing problems with cephfs. The mounted file system produces hanging processes due to pg stuck inactive. This often happens after I marked single osds out manually. A typical result is this: HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs behind on trimming; Reduced data availability: 4 pgs inactive MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs mdsmds1(mds.0): 4 slow metadata IOs are blocked > 30 secs, oldest blocked for 51206 secs MDS_TRIM 1 MDSs behind on trimming mdsmds1(mds.0): Behind on trimming (4298/128) max_segments: 128, num_segments: 4298 PG_AVAILABILITY Reduced data availability: 4 pgs inactive pg 21.1f9 is stuck inactive for 52858.655306, current state remapped, last acting [8,2147483647,2147483647,26,27,11] pg 21.22f is stuck inactive for 52858.636207, current state remapped, last acting [27,26,4,2147483647,15,2147483647] pg 21.2b5 is stuck inactive for 52865.857165, current state remapped, last acting [6,2147483647,21,27,11,2147483647] pg 21.3ed is stuck inactive for 52865.852710, current state remapped, last acting [26,18,14,20,2147483647,2147483647] The placement groups are from an erasure coded pool. # ceph osd erasure-code-profile get CLAYje4_2_5 crush-device-class= crush-failure-domain=host crush-root=default d=5 k=4 m=2 plugin=clay It helps restarting the primary osd from the stuck pgs to get them alive again. This problem keeps us from using this cluster as a productive system. I'm still a beginner with ceph and this cluster is still in testing phase. What I'm doing wrong? Is this problem a symptom of using the clay erasure code? Thanks Lars _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com