Re: why sudden (and brief) HEALTH_ERR

lists <lists@xxxxxxxxxxxxx> · Wed, 4 Oct 2017 09:59:05 +0200

ok, thanks for the feedback Piotr and Dan!

MJ

On 4-10-2017 9:38, Dan van der Ster wrote:
Since Jewel (AFAIR), when (re)starting OSDs, pg status is reset to "never
contacted", resulting in "pgs are stuck inactive for more than 300 seconds"
being reported until osds regain connections between themselves.

Also, the last_active state isn't updated very regularly, as far as I can tell.
On our cluster I have increased this timeout

--mon_pg_stuck_threshold: 1800

(Which helps suppress these bogus HEALTH_ERR's)

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com