if we cant replace a drive on a node in a crash situation, without blowing away the entire node.... seems to me ceph octopus fails the "test" part of the "test cluster" :-/ I vaguely recall running into this "doesnt have PARTUUID" problem before. THAT time, I did end up wiping the entire machine I think. But for preparing for production use, I really need to have a better documented method. I note that I cant even fall back to "ceph-disk". since that is no longer in the distribution, it would seem. That would be the "easy" way to deal with this... but it is not here. ----- Original Message ----- From: "Stefan Kooman" <stefan@xxxxxx> To: "Philip Brown" <pbrown@xxxxxxxxxx> Cc: "ceph-users" <ceph-users@xxxxxxx> Sent: Friday, March 19, 2021 12:04:30 PM Subject: Re: ceph octopus mysterious OSD crash On 3/19/21 7:47 PM, Philip Brown wrote: I see. > > I dont think it works when 7/8 devices are already configured, and the SSD is already mostly sliced. OK. If it is a test cluster you might just blow it all away. By doing this you are simulating a "SSD" failure taking down all HDDs with it. It sure isn't pretty. I would say the situation you ended up with is not a corner case by any means. I am afraid I would really need to set up a test cluster with cephadm to help you further at this point, besides the suggestion above. Gr. Stefan _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx