Re: Module 'cephadm' has failed: invalid literal for int() with base 10:

Eugen Block <eblock@xxxxxx> · Wed, 12 Apr 2023 07:09:32 +0000

Hi,

have you tried a mgr failover?

Zitat von Duncan M Tooke <duncan.tooke@xxxxxxxxxxxx>:

Hi,

Our Ceph cluster is in an error state with the message:

# ceph status
  cluster:
    id:     58140ed2-4ed4-11ed-b4db-5c6f69756a60
    health: HEALTH_ERR
            Module 'cephadm' has failed: invalid literal for int()  
with base 10: '352.broken'

This happened after trying to re-add an OSD which had failed.  
Adopting it back in to the Ceph failed because a directory was  
causing problems in /var/lib/ceph/{cephid}/osd.352. To re-add the  
OSD I renamed it to osd.352.broken (rather than delete it), re-ran  
the command and then everything worked perfectly. Then 5 minutes  
later the ceph orchestrator went into "HEALTH_ERR"

I've removed that directory, but "cephadm" isn't cleaning up after  
itself. Does anyone know if there's a way I can clear the cache for  
this directory it's tried to inventory and failed?

Thanks,

Duncan
--
Dr Duncan Tooke | Research Cluster Administrator
Centre for Computational Biology, Weatherall Institute of Molecular Medicine,
University of Oxford, OX3 9DS
www.imm.ox.ac.uk<http://www.imm.ox.ac.uk>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx