Re: Urgent help! RGW Disappeared on Quincy

Deep Dish <deeepdish@xxxxxxxxx> · Tue, 27 Dec 2022 13:29:52 -0500

HI Pavin,

Thanks for the reply.   I'm a bit at a loss honestly as this worked
perfectly without any issue up until the rebalance of the cluster.
Orchestrator is great.   Aside from this (which I suspect is not
orchestrator related), I haven't had any issues.

In terms of logs, I'm not sure where to start looking in this new
containerized environment as they pertain to individual ceph processes -- I
assumed everything would be centrally collected within orch.

Connecting into the podman container of a RGW, there are no logs in
/var/log/ceph aside from ceph-volume.   My ceph.conf is minimal with only
monitors defined.  The only log I'm able to pull is as follows:

# podman logs 35d4ac5445ca

INFO:ceph-crash:monitoring path /var/lib/ceph/crash, delay 600s

Traceback (most recent call last):

  File "/usr/bin/ceph-crash", line 113, in <module>

    main()

  File "/usr/bin/ceph-crash", line 109, in main

    time.sleep(args.delay * 60)

TypeError: handler() takes 1 positional argument but 2 were given

INFO:ceph-crash:monitoring path /var/lib/ceph/crash, delay 600s

Looks like the RGW daemon is crashing.   How do I get logs to persist?   I
suspect I won't be able to use orchestrator to push down the config, and
would have to manipulate within the container image itself.

I also attempted to redeply the RGW containers without success.

On Tue, Dec 27, 2022 at 10:39 AM Pavin Joseph <me@xxxxxxxxxxxxxxx> wrote:

> Here's the first things I'd check in your situation:
>
> 1. Logs
> 2. Is the RGW HTTP server running on its port?
> 3. Re-check config including authentication.
>
> ceph orch is too new and didn't pass muster in our own internal testing.
> You're braver than most for using it in production.
>
> Pavin.
>
> On 27-Dec-22 8:48 PM, Deep Dish wrote:
> > Quick update:
> >
> > - I followed documentation, and ran the following:
> >
> > # ceph dashboard set-rgw-credentials
> >
> > Error EINVAL: No RGW credentials found, please consult the documentation
> on
> > how to enable RGW for the dashboard.
> >
> >
> >
> > - I see dashboard credentials configured (all this was working fine
> before):
> >
> >
> > # ceph dashboard get-rgw-api-access-key
> >
> > P?????????????????G  (? commented out)
> >
> >
> >
> > Seems to me like my RGW config is non-existent / corrupted for some
> > reason.  When trying to curl a RGW directly I get a "connection refused".
> >
> >
> >
> > On Tue, Dec 27, 2022 at 9:41 AM Deep Dish <deeepdish@xxxxxxxxx> wrote:
> >
> >> I built a net-new Quincy cluster (17.2.5) using ceph orch as follows:
> >>
> >> 2x mgrs
> >> 4x rgw
> >> 5x mon
> >> 4x rgw
> >> 5x mds
> >> 6x osd hosts w/ 10 drives each --> will be growing to 7 osd hosts in the
> >> coming days.
> >>
> >> I migrated all data from my legacy nautilus cluster (via rbd-mirror,
> >> rclone for s3 buckets, etc.).  All moved over successfully without
> issue.
> >>
> >> The cluster went through a series of rebalancing events (adding
> capacity,
> >> osd nodes, changing fault domain for EC volumes).
> >>
> >> It's settled now, however throughout the process all of my RGW nodes are
> >> no longer part of the cluster -- meaning ceph doesn't recognize / detect
> >> them, despite containers, networking, etc. all being setup correctly.
> >> This also means I'm unable to manage any RGW functions (via the
> dashboard
> >> or cli).   As an example via cli (within Cephadm shell):
> >>
> >> # radosgw-admin pools list
> >>
> >> could not list placement set: (2) No such file or directory
> >>
> >> I have data in buckets, how can I get my RGWs to return online?
> >>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx