Re: Upgrading ceph with HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent

"Marc Roos" <M.Roos@xxxxxxxxxxxxxxxxx> · Thu, 6 Sep 2018 14:00:28 +0200

> 
> >
> >
> > The adviced solution is to upgrade ceph only in HEALTH_OK state. And 
I
> > also read somewhere that is bad to have your cluster for a long time 
in
> > an HEALTH_ERR state.
> >
> > But why is this bad?
> 
> Aside from the obvious (errors are bad things!), many people have
> external monitoring systems that will alert them on the transitions
> between OK/WARN/ERR.  If the system is stuck in ERR for a long time,
> they are unlikely to notice new errors or warnings.  These systems can
> accumulate faults without the operator noticing.

All obvious, I would expect such answer on psychology mailing list ;)

I am mostly testing with ceph, and trying to educate myself a bit.
I am asking because I had this error in sep. 2017 then when changing
the crush reweight it disappeared, on jan.2018 after scrubbing it 
appeared and now after adding the 4th node it disappeared again. 

> > Why is this bad during upgrading?
> 
> It depends what's gone wrong.  For example:

>  - If your cluster is degraded (fewer than desired number of replicas
> of data) then taking more services offline (even briefly) to do an
> upgrade will create greater risk to the data by reducing the number of
> copies available.
> - If your system is in an error state because something has gone bad
> on disk, then recovering it with the same software that wrote the data
> is a more tested code path than running some newer code against a
> system left in a strange state by an older version.
> 
> There will always be exceptions to this (e.g. where the upgrade is the
> fix for whatever caused the error), but the general purpose advice is
> to get a system nice and clean before starting the upgrade.
> 
> John
> 
> > Can I quantify how bad it is? (like with large log/journal file?)
> >
> >

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com