On Thu, Mar 9, 2017 at 3:04 AM, Christian Balzer <chibi@xxxxxxx> wrote: > > > Hello, > > during OSD restarts with Jewel (10.2.5 and .6 at least) I've seen > "stuck inactive for more than 300 seconds" errors like this when observing > things with "watch ceph -s" : > --- > health HEALTH_ERR > 59 pgs are stuck inactive for more than 300 seconds > 223 pgs degraded > 74 pgs peering > 84 pgs stale > 59 pgs stuck inactive > 297 pgs stuck unclean > 223 pgs undersized > recovery 38420/179352 objects degraded (21.422%) > 2/16 in osds are down > --- > > Now this is is neither reflected in any logs, nor true of course (the > restarts take a few seconds per OSD and the cluster is fully recovered > to HEALTH_OK in 12 seconds or so. > > But it surely is a good scare for somebody not doing this on a test > cluster. > > Anybody else seeing this? Definitely. ceph -w shows them as well. They indeed always clear after a few seconds. > > Christian > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > http://www.gol.com/ Kind regards, Ruben Kerkhof _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com