Re: Upgrading ceph with HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent

"Marc Roos" <M.Roos@xxxxxxxxxxxxxxxxx> · Thu, 6 Sep 2018 13:59:43 +0200

Thanks interesting to read. So in luminous it is not really a problem. I 
was expecting to get into trouble with the monitors/mds. Because my 
failover takes quite long, and thought it was related to the damaged pg

Luminous: "When the past intervals tracking structure was rebuilt around 
exactly the information required, it became extremely compact and 
relatively insensitive to extended periods of cluster unhealthiness" 

> >
> >
> > The adviced solution is to upgrade ceph only in HEALTH_OK state. And 
I
> > also read somewhere that is bad to have your cluster for a long time 
in
> > an HEALTH_ERR state.
> >
> > But why is this bad?

See https://ceph.com/community/new-luminous-pg-overdose-protection
under "Problems with past intervals"

"if the cluster becomes unhealthy, and especially if it remains 
unhealthy for an extended period of time, a combination of effects can 
cause problems."

"If a cluster is unhealthy for an extended period of time (e.g., days or 
even weeks), the past interval set can become large enough to require a 
significant amount of memory."

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com