HEALTH_ERR when (re)starting ceph-osd's

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I haven't noticed it before, but since merging https://github.com/ceph/ceph/pull/7253 I see that, when restarting daemons on healthy ceph cluster, it goes to HEALTH_ERR state with "$(random_number) pgs are stuck inactive for more than 300 seconds". 
I looked at the commit and it turns out it will be always occurring on restart/boot, as booting pgs are inactive "by default" (since mons never received any sign of life from them) - not because they're actually stuck inactive.
One solution to this would be to mark pg_stat.last_* fields to the point where it were first seen, so they will become stuck (mon_pg_stuck_threshold) seconds after first registering, and not right away.
Another, less invasive one, is to just let user disable this warning.

What do you think?

With best regards / Pozdrawiam
Piotr Dałek

��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux