Re: ceph health "overall_status": "HEALTH_WARN"

Monish Selvaraj <monish@xxxxxxxxxxxxxxx> · Mon, 25 Jul 2022 14:56:26 +0530

Hi all,

Recently, I deployed ceph orch ( pacific ) in my nodes with 5 mons 5 mgrs
238 osds and 5 rgw.

Yesterday , 4 osds went out and 2 rgws down. So, i restart whole rgw by
"ceph orch restart rgw.rgw". After two minutes , the whole rgw nodes goes
down.

Then I turned up the 4 osds and also waited to become the ceph Health OK.
But, rgw services is up and running and the port is binding.

RGW logs :

root@cephr04:/var/log/ceph/0df2c8fe-fdf1-11ec-9713-b175dcec685a# tail -800
ceph-client.rgw.rgw.cephr04.wpfaui.log
2022-07-25T06:06:37.528+0000 7f89423035c0 0 deferred set uid:gid to 167:167
(ceph:ceph)
2022-07-25T06:06:37.528+0000 7f89423035c0 0 ceph version 16.2.9
(4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable), process
radosgw, pid 6
2022-07-25T06:06:37.528+0000 7f89423035c0 0 framework: beast
2022-07-25T06:06:37.528+0000 7f89423035c0 0 framework conf key: port, val:
80
2022-07-25T06:06:37.528+0000 7f89423035c0 1 radosgw_Main not setting numa
affinity
2022-07-25T06:11:37.529+0000 7f892d9be700 -1 Initialization timeout, failed
to initialize
2022-07-25T06:11:47.841+0000 7fae36b985c0 0 deferred set uid:gid to 167:167
(ceph:ceph)
2022-07-25T06:11:47.841+0000 7fae36b985c0 0 ceph version 16.2.9
(4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable), process
radosgw, pid 7
2022-07-25T06:11:47.841+0000 7fae36b985c0 0 framework: beast
2022-07-25T06:11:47.841+0000 7fae36b985c0 0 framework conf key: port, val:
80
2022-07-25T06:11:47.841+0000 7fae36b985c0 1 radosgw_Main not setting numa
affinity
2022-07-25T06:16:47.842+0000 7fae22253700 -1 Initialization timeout, failed
to initialize
2022-07-25T06:16:58.114+0000 7fb4bac385c0 0 deferred set uid:gid to 167:167
(ceph:ceph)
2022-07-25T06:16:58.114+0000 7fb4bac385c0 0 ceph version 16.2.9
(4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable), process
radosgw, pid 7
2022-07-25T06:16:58.114+0000 7fb4bac385c0 0 framework: beast
2022-07-25T06:16:58.114+0000 7fb4bac385c0 0 framework conf key: port, val:
80
2022-07-25T06:16:58.114+0000 7fb4bac385c0 1 radosgw_Main not setting numa
affinity
2022-07-25T06:21:58.111+0000 7fb4a62f3700 -1 Initialization timeout, failed
to initialize
2022-07-25T06:22:08.359+0000 7f4b33dbd5c0 0 deferred set uid:gid to 167:167
(ceph:ceph)
2022-07-25T06:22:08.359+0000 7f4b33dbd5c0 0 ceph version 16.2.9
(4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable), process
radosgw, pid 7
2022-07-25T06:22:08.359+0000 7f4b33dbd5c0 0 framework: beast
2022-07-25T06:22:08.359+0000 7f4b33dbd5c0 0 framework conf key: port, val:
80
2022-07-25T06:22:08.359+0000 7f4b33dbd5c0 1 radosgw_Main not setting numa
affinity
2022-07-25T06:25:03.189+0000 7fa6920085c0 0 deferred set uid:gid to 167:167
(ceph:ceph)
2022-07-25T06:25:03.189+0000 7fa6920085c0 0 ceph version 16.2.9
(4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable), process
radosgw, pid 7
2022-07-25T06:25:03.189+0000 7fa6920085c0 0 framework: beast
2022-07-25T06:25:03.189+0000 7fa6920085c0 0 framework conf key: port, val:
80
2022-07-25T06:25:03.189+0000 7fa6920085c0 1 radosgw_Main not setting numa
affinity

Environment:

OS Ubunutu-20.04
Kernel 5.4.0-122-generic
Docker version 20.10.17
Ceph version ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830)
pacific (stable)

On Mon, Jul 25, 2022 at 2:54 PM Konstantin Shalygin <k0ste@xxxxxxxx> wrote:

> Hi,
>
> The Mimic have many HEALTH troubles like this
> Mimic is EOL for a years, I suggest you to upgrade to Nautilus 14.2.22 at
> least
>
>
> k
>
> > On 25 Jul 2022, at 11:45, Frank Schilder <frans@xxxxxx> wrote:
> >
> > Hi all,
> >
> > I made a strange observation on our cluster. The command ceph status -f
> json-pretty returns at the beginning
> >
> >    "health": {
> >        "checks": {},
> >        "status": "HEALTH_OK",
> >        "overall_status": "HEALTH_WARN"
> >    },
> >
> > I'm a bit worried about what "overall_status": "HEALTH_WARN" could mean
> in this context. I can't seem to find any more info about that. Ceph health
> detail returns HEALTH_OK.
> >
> > Any hint is welcome. version is mimic 13.2.10.
> >
> > Best regards,
> > =================
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx