Re: Ceph orchestrator not refreshing device list

Tobias Fischer <tobias.fischer@xxxxxxxxx> · Thu, 24 Oct 2024 07:44:44 +0200

Hi Bob,
have you tried to restart the active mgr? ( sometimes mgr gets stuck and
prevents the orchestrator from working correctly ).
Regarding the orchestrator device scan: have a look into the
ceph-volume.log on the corresponding host. you will find it under
/var/log/ceph/CLUSTER-ID/ceph-volume.log
this log is generated by the device scan from the orchestrator. It may also
help to have a look at cephadm debug logs - see
https://docs.ceph.com/en/latest/cephadm/operations/#watching-cephadm-log-messages
Cheers,
tobi

Am Mi., 23. Okt. 2024 um 20:15 Uhr schrieb Bob Gibson <rjg@xxxxxxxxxx>:

> Sorry to resurrect this thread, but while I was able to get the cluster
> healthy again by manually creating the osd, I'm still unable to manage osds
> using the orchestrator.
>
> The orchestrator is generally working, but It appears to be unable to scan
> devices. Immediately after failing out the mgr `ceph orch device ls` will
> display device status from >4 weeks ago, which was when we converted the
> cluster to be managed by cephadm. Eventually the orchestrator will attempt
> to refresh its device status. At this point `ceph orch device ls` stops
> displaying any output at all. I can reproduce this state almost immediately
> if I run `ceph orch device ls —refresh` to force an immediate refresh. The
> mgr log shows events like the following just before `ceph orch device ls`
> stops reporting output (one event for every osd node in the cluster):
>
> "Detected new or changed devices on ceph-osd31”
>
> Here are the osd services in play:
>
> # ceph orch ls osd
> NAME            PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
> osd                         95  8m ago     -    <unmanaged>
> osd.ceph-osd31               4  8m ago     6d   ceph-osd31
>
> # ceph orch ls osd --export
> service_type: osd
> service_name: osd
> unmanaged: true
> spec:
>   filter_logic: AND
>   objectstore: bluestore
> ---
> service_type: osd
> service_id: ceph-osd31
> service_name: osd.ceph-osd31
> placement:
>   hosts:
>   - ceph-osd31
> spec:
>   data_devices:
>     rotational: 0
>     size: '3TB:'
>   encrypted: true
>   filter_logic: AND
>   objectstore: bluestore
>
> I tried deleting the default “osd” service in case it was somehow
> conflicting with my per-node spec, but it looks like that’s not allowed, so
> I assume any custom osd service specs override the unmanaged default.
>
> # ceph orch rm osd
> Invalid service 'osd'. Use 'ceph orch ls' to list available services.
>
> My hunch is that some persistent state is corrupted, or there’s something
> else preventing the orchestrator from successfully refreshing its device
> status, but I don’t know how to troubleshoot this. Any ideas?
>
> Cheers,
> /rjg
>
> P.S. @Eugen: When I first started this thread you said it was unnecessary
> to destroy an osd to convert it from unmanaged to managed. Can you explain
> how this is done? Although we want to recreate the osds to enable
> encryption, it would save time, and unnecessary wear on the SSDs, while
> troubleshooting.
>
> On Oct 16, 2024, at 2:45 PM, Eugen Block <eblock@xxxxxx> wrote:
>
> EXTERNAL EMAIL | USE CAUTION
>
> Glad to hear it worked out for you!
>
> Zitat von Bob Gibson <rjg@xxxxxxxxxx>:
>
> I’ve been away on vacation and just got back to this. I’m happy to
> report that manually recreating the OSD with ceph-volume and then
> adopting it with cephadm fixed the problem.
>
> Thanks again for your help Eugen!
>
> Cheers,
> /rjg
>
> On Sep 29, 2024, at 10:40 AM, Eugen Block <eblock@xxxxxx> wrote:
>
> EXTERNAL EMAIL | USE CAUTION
>
> Okay, apparently this is not what I was facing. I see two other
> options right now. The first would be to purge osd.88 from the crush
> tree entirely.
> The second approach would be to create an osd manually with
> ceph-volume, not cephadm ceph-volume, to create a legacy osd (you'd
> get warnings about a stray daemon). If that works, adopt the osd with
> cephadm.
> I don't have a better idea right now.
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 
Best Regards,

Tobias Fischer

Head of Ceph
Clyso GmbH
p: +49 89 2152527 41
a: Hohenzollernstraße 27 | 80801 München | Germany
w: https://clyso.com | e: tobias.fischer@xxxxxxxxx

We are hiring: https://www.clyso.com/jobs/
---
Geschäftsführer: Dipl. Inf. (FH) Joachim Kraftmayer
Unternehmenssitz: Utting am Ammersee
Handelsregister beim Amtsgericht: Augsburg
Handelsregister-Nummer: HRB 25866
USt. ID-Nr.: DE275430677
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx