On 28-01-16 11:48, Piotr.Dalek@xxxxxxxxxxxxxx wrote: > Hello, > > I haven't noticed it before, but since merging https://github.com/ceph/ceph/pull/7253 I see that, when restarting daemons on healthy ceph cluster, it goes to HEALTH_ERR state with "$(random_number) pgs are stuck inactive for more than 300 seconds". > I looked at the commit and it turns out it will be always occurring on restart/boot, as booting pgs are inactive "by default" (since mons never received any sign of life from them) - not because they're actually stuck inactive. Well, in that case, isn't the PR correct? But I see what you mean. > One solution to this would be to mark pg_stat.last_* fields to the point where it were first seen, so they will become stuck (mon_pg_stuck_threshold) seconds after first registering, and not right away. That sounds like a good solution, you might want to take a look at: http://tracker.ceph.com/issues/14028 > Another, less invasive one, is to just let user disable this warning. > As you can see in the discussion on Github, we decided to set 'mon_pg_min_inactive' to 1 by default. You can disable these warnings by either setting it to zero or to maybe something like 10. This is just there that people are informed when multiple PGs are inactive. Being in WARN state, but still not performing I/O is a bad thing. WARN should be where you take a look, but aren't worried. If I/O stops ERR is a good thing to go in to. Wido > What do you think? > > With best regards / Pozdrawiam > Piotr Dałek > > N�����r��y���b�X��ǧv�^�){.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+�����ݢj"��!tml= > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html