Re: ceph octopus mysterious OSD crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/19/21 2:20 AM, Philip Brown wrote:
yup cephadm and orch was used to set all this up.

Current state of things:

ceph osd tree shows

  33    hdd    1.84698              osd.33       destroyed         0  1.00000


^^ Destroyed, ehh, this doesn't look good to me. Ceph thinks this OSD is destroyed. Do you know what might have happened to osd.33? Did you perform a "kill an OSD" while testing?

AFAIK you can't fix that anymore. You will have to remove it and redploy it. Might even get a new osd.id.



cephadm logs --name osd.33 --fsid xx-xx-xx-xx

along with the systemctl stuff I already saw, showed me new things such as

ceph-osd[1645438]: did not load config file, using default settings.

ceph-osd[1645438]: 2021-03-18T14:31:32.990-0700 7f8bf14e3bc0 -1 parse_file: filesystem error: cannot get file size: No such file or directory

This suggested to me that I needed to copy over /etc/ceph/ceph.conf to the OSD node.
which I did.
I then also copied over the admin key and generated a fresh bootstrap-osd key with it, just for good measure, with
   ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring



I had saved the previous output of ceph-volume lvm list
and on the OSD node, ran

ceph-volume lvm prepare --data xxxx --block.db xxxx

But it says osd is already prepared.


I tried an activate... it tells me

--> ceph-volume lvm activate successful for osd ID: 33



but now the cephadm logs output shows me


ceph-osd[1677135]: 2021-03-18T17:57:47.982-0700 7ff64593f700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]



Not the best error message :-}


Indeed, would be nice to have a references to [2]. But I think why you get this is because of the destroyed OSD. I would use cephadm docu on how to replace an osd. Does that exist? We add a large thread about this "container" topic (see " ceph-ansible in Pacific and beyond?").


Now what do I need to do?

I would remove osd.33. Even manually editing crushmaps if need to (should not be the case), and then redeploy this osd and wait for recovery.


If you have not manually "destroyed" this osd than either things work differently in Octopus from things I have seen so far, my memory is failing me, or some really weird stuff is happening and I would really like to know what that is.

Wat version are you running? Do note that 15.2.10 has been released.

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux