Hi,
seems like this tracker issue [1] already covers your question. I'll
update the issue and add a link to our thread.
[1] https://tracker.ceph.com/issues/57767
Zitat von Frank Schilder <frans@xxxxxx>:
Hi Eugen,
can you confirm that the silent corruption happens also on a
collocated OSDc (everything on the same device) on pacific? The zap
command should simply exit with "osd not down+out" or at least not
do anything.
If this accidentally destructive behaviour is still present, I think
it is worth a ticket. Since I can't test on versions higher than
octopus yet, could you then open the ticket?
Thanks!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: 23 November 2022 09:27:22
To: ceph-users@xxxxxxx
Subject: Re: ceph-volume lvm zap destroyes up+in OSD
Hi,
I can confirm the behavior for Pacific version 16.2.7. I checked with
a Nautilus test cluster and there it seems to work as expected. I
tried to zap a db device and then restarted one of the OSDs,
successfully. So there seems to be a regression somewhere. I didn't
search for tracker issues yet, but this seems to be worth one, right?
Zitat von Frank Schilder <frans@xxxxxx>:
Hi all,
on our octopus-latest cluster I accidentally destroyed an up+in OSD
with the command line
ceph-volume lvm zap /dev/DEV
It executed the dd command and then failed at the lvm commands with
"device busy". Problem number one is, that the OSD continued working
fine. Hence, there is no indication of a corruption, its a silent
corruption. Problem number two - the real one - is, why is
ceph-colume not checking if the OSD that device belongs to is still
up+in? "ceph osd destroy" does that, for example. I believe to
remember that "ceph-volume lvm zap --osd-id" also checks, but I'm
not sure.
Has this been changed in versions later than octopus?
I think it is extremely dangerous to provide a tool that allows the
silent corruption of an entire ceph cluster. The corruption is only
discovered on restart and then it would be too late (unless there is
an in-official recovery procedure somewhere).
I would prefer that ceph-volume lvm zap employs the same strict
sanity checks as other ceph-commands to avoid accidents. In my case
it was a typo, one wrong letter.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx