Re: External RGW always down

Monish Selvaraj <monish@xxxxxxxxxxxxxxx> · Mon, 26 Sep 2022 19:51:27 +0530

Hi Eugen,

Yes, I have an inactive pgs when the osd goes down. Then I started the osds
manually. But the rgw fails to start.

Only upgrading to a newer version is only for the issue and we faced this
issue two times.

I dont know why it is happening. But maybe the rgw are running in separate
machines. This causes the issue ?

On Sat, Sep 10, 2022 at 11:27 PM Eugen Block <eblock@xxxxxx> wrote:

> You didn’t respond to the other questions. If you want people to be
> able to help you need to provide more information. If your OSDs fail
> do you have inactive PGs? Or do you have full OSDs which would RGW
> prevent from starting? I’m assuming that if you fix your OSDs the RGWs
> would start working again. But then again, we still don’t know
> anything about the current situation.
>
> Zitat von Monish Selvaraj <monish@xxxxxxxxxxxxxxx>:
>
> > Hi Eugen,
> >
> > Below is the log output,
> >
> > 2022-09-07T12:03:42.893+0000 7fdd23fdc5c0  0 framework: beast
> > 2022-09-07T12:03:42.893+0000 7fdd23fdc5c0  0 framework conf key: port,
> val:
> > 80
> > 2022-09-07T12:03:42.893+0000 7fdd23fdc5c0  1 radosgw_Main not setting
> numa
> > affinity
> > 2022-09-07T12:03:42.893+0000 7fdd23fdc5c0  1 rgw_d3n:
> > rgw_d3n_l1_local_datacache_enabled=0
> > 2022-09-07T12:03:42.893+0000 7fdd23fdc5c0  1 D3N datacache enabled: 0
> > 2022-09-07T12:03:53.313+0000 7fdd23fdc5c0  1 rgw main: int
> > RGWSI_Notify::robust_notify(const DoutPrefixProvider*, RGWSI_RADOS::Obj&,
> > const RGWCacheNotifyInfo&, optional_yi>
> > 2022-09-07T12:03:53.313+0000 7fdd23fdc5c0  1 rgw main: int
> > RGWSI_Notify::robust_notify(const DoutPrefixProvider*, RGWSI_RADOS::Obj&,
> > const RGWCacheNotifyInfo&, optional_yi>
> > 2022-09-07T12:08:42.891+0000 7fdd1661c700 -1 Initialization timeout,
> failed
> > to initialize
> > 2022-09-07T12:08:53.395+0000 7f69017095c0  0 deferred set uid:gid to
> > 167:167 (ceph:ceph)
> > 2022-09-07T12:08:53.395+0000 7f69017095c0  0 ceph version 17.2.0
> > (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable), process
> > radosgw, pid 7
> > 2022-09-07T12:08:53.395+0000 7f69017095c0  0 framework: beast
> > 2022-09-07T12:08:53.395+0000 7f69017095c0  0 framework conf key: port,
> val:
> > 80
> > 2022-09-07T12:08:53.395+0000 7f69017095c0  1 radosgw_Main not setting
> numa
> > affinity
> > 2022-09-07T12:08:53.395+0000 7f69017095c0  1 rgw_d3n:
> > rgw_d3n_l1_local_datacache_enabled=0
> > 2022-09-07T12:08:53.395+0000 7f69017095c0  1 D3N datacache enabled: 0
> > 2022-09-07T12:09:03.747+0000 7f69017095c0  1 rgw main: int
> > RGWSI_Notify::robust_notify(const DoutPrefixProvider*, RGWSI_RADOS::Obj&,
> > const RGWCacheNotifyInfo&, optional_yi>
> > 2022-09-07T12:09:03.747+0000 7f69017095c0  1 rgw main: int
> > RGWSI_Notify::robust_notify(const DoutPrefixProvider*, RGWSI_RADOS::Obj&,
> > const RGWCacheNotifyInfo&, optional_yi>
> > 2022-09-07T12:13:53.397+0000 7f68f3d49700 -1 Initialization timeout,
> failed
> > to initialize
> >
> > I installed the cluster in quincy.
> >
> >
> > On Sat, Sep 10, 2022 at 4:02 PM Eugen Block <eblock@xxxxxx> wrote:
> >
> >> What troubleshooting have you tried? You don’t provide any log output
> >> or information about the cluster setup, for example the ceph osd tree,
> >> ceph status, are the failing OSDs random or do they all belong to the
> >> same pool? Any log output from failing OSDs and the RGWs might help,
> >> otherwise it’s just wild guessing. Is the cluster a new installation
> >> with cephadm or an older cluster upgraded to Quincy?
> >>
> >> Zitat von Monish Selvaraj <monish@xxxxxxxxxxxxxxx>:
> >>
> >> > Hi all,
> >> >
> >> > I have one critical issue in my prod cluster. When the customer's data
> >> > comes from 600 MiB .
> >> >
> >> > My Osds are down *8 to 20 from 238* . Then I manually up my osds .
> After
> >> a
> >> > few minutes, my all rgw crashes.
> >> >
> >> > We did some troubleshooting but nothing works. When we upgrade ceph to
> >> > 17.2.0. to 17.2.1 is resolved. Also we faced the issue two times. But
> >> both
> >> > times we upgraded the ceph.
> >> >
> >> > *Node schema :*
> >> >
> >> > *Node 1 to node 5 --> mon,mgr and osds*
> >> > *Node 6 to Node15 --> only osds*
> >> > *Node 16 to Node 20 --> only rgws.*
> >> >
> >> > Kindly, check this issue and let me know the correct troubleshooting
> >> method.
> >> > _______________________________________________
> >> > ceph-users mailing list -- ceph-users@xxxxxxx
> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
>
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx