Re: why sudden (and brief) HEALTH_ERR

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Wed, 4 Oct 2017 09:38:35 +0200

On Wed, Oct 4, 2017 at 9:08 AM, Piotr Dałek <piotr.dalek@xxxxxxxxxxxx> wrote:
> On 17-10-04 08:51 AM, lists wrote:
>>
>> Hi,
>>
>> Yesterday I chowned our /var/lib/ceph ceph, to completely finalize our
>> jewel migration, and noticed something interesting.
>>
>> After I brought back up the OSDs I just chowned, the system had some
>> recovery to do. During that recovery, the system went to HEALTH_ERR for a
>> short moment:
>>
>> See below, for consecutive ceph -s outputs:
>>
>>> [..]
>>> root@pm2:~# ceph -s
>>>     cluster 1397f1dc-7d94-43ea-ab12-8f8792eee9c1
>>>      health HEALTH_ERR
>>>             2 pgs are stuck inactive for more than 300 seconds
>
>
> ^^ that.
>
>>>             761 pgs degraded
>>>             2 pgs recovering
>>>             181 pgs recovery_wait
>>>             2 pgs stuck inactive
>>>             273 pgs stuck unclean
>>>             543 pgs undersized
>>>             recovery 1394085/8384166 objects degraded (16.628%)
>>>             4/24 in osds are down
>>>             noout flag(s) set
>>>      monmap e3: 3 mons at
>>> {0=10.10.89.1:6789/0,1=10.10.89.2:6789/0,2=10.10.89.3:6789/0}
>>>             election epoch 256, quorum 0,1,2 0,1,2
>>>      osdmap e10230: 24 osds: 20 up, 24 in; 543 remapped pgs
>>>             flags noout,sortbitwise,require_jewel_osds
>>>       pgmap v36531146: 1088 pgs, 2 pools, 10703 GB data, 2729 kobjects
>>>             32724 GB used, 56656 GB / 89380 GB avail
>>>             1394085/8384166 objects degraded (16.628%)
>>>                  543 active+undersized+degraded
>>>                  310 active+clean
>>>                  181 active+recovery_wait+degraded
>>>                   26 active+degraded
>>>                   13 active
>>>                    9 activating+degraded
>>>                    4 activating
>>>                    2 active+recovering+degraded
>>> recovery io 133 MB/s, 37 objects/s
>>>   client io 64936 B/s rd, 9935 kB/s wr, 0 op/s rd, 942 op/s wr
>>> [..]
>>
>> It was only very briefly, but it did worry me a bit, fortunately, we went
>> back to the expected HEALTH_WARN very quickly, and everything finished fine,
>> so I guess nothing to worry.
>>
>> But I'm curious: can anyone explain WHY we got a brief HEALTH_ERR?
>>
>> No smart errors, apply and commit latency are all within the expected
>> ranges, the systems basically is healthy.
>>
>> Curious :-)
>
>
> Since Jewel (AFAIR), when (re)starting OSDs, pg status is reset to "never
> contacted", resulting in "pgs are stuck inactive for more than 300 seconds"
> being reported until osds regain connections between themselves.
>

Also, the last_active state isn't updated very regularly, as far as I can tell.
On our cluster I have increased this timeout

--mon_pg_stuck_threshold: 1800

(Which helps suppress these bogus HEALTH_ERR's)

-- dan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com