RE: HEALTH_ERR when (re)starting ceph-osd's

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Wido den Hollander [mailto:wido@xxxxxxxx]
> Sent: Thursday, January 28, 2016 1:38 PM
> 
> On 28-01-16 11:48, Piotr.Dalek@xxxxxxxxxxxxxx wrote:
> > Hello,
> >
> > I haven't noticed it before, but since merging
> https://github.com/ceph/ceph/pull/7253 I see that, when restarting
> daemons on healthy ceph cluster, it goes to HEALTH_ERR state with
> "$(random_number) pgs are stuck inactive for more than 300 seconds".
> > I looked at the commit and it turns out it will be always occurring on
> restart/boot, as booting pgs are inactive "by default" (since mons never
> received any sign of life from them) - not because they're actually stuck
> inactive.
> 
> Well, in that case, isn't the PR correct? But I see what you mean.

Actually, the only thing wrong with this is that it reports PGs as inactive for some prolonged period of time, when it's not true.
 
> > One solution to this would be to mark pg_stat.last_* fields to the point
> where it were first seen, so they will become stuck
> (mon_pg_stuck_threshold) seconds after first registering, and not right
> away.
> 
> That sounds like a good solution, you might want to take a look at:
> http://tracker.ceph.com/issues/14028

I'll take a look. Maybe we could fix two issues with one PR ;-)

With best regards / Pozdrawiam
Piotr Dałek


��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux