Re: ceph octopus mysterious OSD crash

Philip Brown <pbrown@xxxxxxxxxx> · Fri, 19 Mar 2021 13:11:48 -0700 (PDT)

if we cant replace a drive on a node in a crash situation, without blowing away the entire node....
seems to me ceph octopus fails the "test" part of the "test cluster" :-/

I vaguely recall running into this "doesnt have PARTUUID" problem before.
THAT time, I did end up wiping the entire machine I think.
But for preparing for production use, I really need to have a better documented method.

I note that I cant even fall back to "ceph-disk". since that is no longer in the distribution, it would seem.
That would be the "easy" way to deal with this... but it is not here.

----- Original Message -----
From: "Stefan Kooman" <stefan@xxxxxx>
To: "Philip Brown" <pbrown@xxxxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxx>
Sent: Friday, March 19, 2021 12:04:30 PM
Subject: Re:  ceph octopus mysterious OSD crash

On 3/19/21 7:47 PM, Philip Brown wrote:

I see.

> 
> I dont think it works when 7/8 devices are already configured, and the SSD is already mostly sliced.

OK. If it is a test cluster you might just blow it all away. By doing 
this you are simulating a "SSD" failure taking down all HDDs with it. It 
sure isn't pretty. I would say the situation you ended up with is not a 
corner case by any means. I am afraid I would really need to set up a 
test cluster with cephadm to help you further at this point, besides the 
suggestion above.

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx