> -----Original Message----- > From: Wido den Hollander [mailto:wido@xxxxxxxx] > Sent: Thursday, January 28, 2016 1:38 PM > > On 28-01-16 11:48, Piotr.Dalek@xxxxxxxxxxxxxx wrote: > > Hello, > > > > I haven't noticed it before, but since merging > https://github.com/ceph/ceph/pull/7253 I see that, when restarting > daemons on healthy ceph cluster, it goes to HEALTH_ERR state with > "$(random_number) pgs are stuck inactive for more than 300 seconds". > > I looked at the commit and it turns out it will be always occurring on > restart/boot, as booting pgs are inactive "by default" (since mons never > received any sign of life from them) - not because they're actually stuck > inactive. > > Well, in that case, isn't the PR correct? But I see what you mean. Actually, the only thing wrong with this is that it reports PGs as inactive for some prolonged period of time, when it's not true. > > One solution to this would be to mark pg_stat.last_* fields to the point > where it were first seen, so they will become stuck > (mon_pg_stuck_threshold) seconds after first registering, and not right > away. > > That sounds like a good solution, you might want to take a look at: > http://tracker.ceph.com/issues/14028 I'll take a look. Maybe we could fix two issues with one PR ;-) With best regards / Pozdrawiam Piotr Dałek ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f