HEALTH_ERR when (re)starting ceph-osd's

"Piotr.Dalek@xxxxxxxxxxxxxx" <Piotr.Dalek@xxxxxxxxxxxxxx> · Thu, 28 Jan 2016 10:48:15 +0000

Hello,

I haven't noticed it before, but since merging https://github.com/ceph/ceph/pull/7253 I see that, when restarting daemons on healthy ceph cluster, it goes to HEALTH_ERR state with "$(random_number) pgs are stuck inactive for more than 300 seconds". 
I looked at the commit and it turns out it will be always occurring on restart/boot, as booting pgs are inactive "by default" (since mons never received any sign of life from them) - not because they're actually stuck inactive.
One solution to this would be to mark pg_stat.last_* fields to the point where it were first seen, so they will become stuck (mon_pg_stuck_threshold) seconds after first registering, and not right away.
Another, less invasive one, is to just let user disable this warning.

What do you think?

With best regards / Pozdrawiam
Piotr Dałek

��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f