Hello, during OSD restarts with Jewel (10.2.5 and .6 at least) I've seen "stuck inactive for more than 300 seconds" errors like this when observing things with "watch ceph -s" : --- health HEALTH_ERR 59 pgs are stuck inactive for more than 300 seconds 223 pgs degraded 74 pgs peering 84 pgs stale 59 pgs stuck inactive 297 pgs stuck unclean 223 pgs undersized recovery 38420/179352 objects degraded (21.422%) 2/16 in osds are down --- Now this is is neither reflected in any logs, nor true of course (the restarts take a few seconds per OSD and the cluster is fully recovered to HEALTH_OK in 12 seconds or so. But it surely is a good scare for somebody not doing this on a test cluster. Anybody else seeing this? Christian -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com