Hi Eugen, can you confirm that the silent corruption happens also on a collocated OSDc (everything on the same device) on pacific? The zap command should simply exit with "osd not down+out" or at least not do anything. If this accidentally destructive behaviour is still present, I think it is worth a ticket. Since I can't test on versions higher than octopus yet, could you then open the ticket? Thanks! ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Eugen Block <eblock@xxxxxx> Sent: 23 November 2022 09:27:22 To: ceph-users@xxxxxxx Subject: Re: ceph-volume lvm zap destroyes up+in OSD Hi, I can confirm the behavior for Pacific version 16.2.7. I checked with a Nautilus test cluster and there it seems to work as expected. I tried to zap a db device and then restarted one of the OSDs, successfully. So there seems to be a regression somewhere. I didn't search for tracker issues yet, but this seems to be worth one, right? Zitat von Frank Schilder <frans@xxxxxx>: > Hi all, > > on our octopus-latest cluster I accidentally destroyed an up+in OSD > with the command line > > ceph-volume lvm zap /dev/DEV > > It executed the dd command and then failed at the lvm commands with > "device busy". Problem number one is, that the OSD continued working > fine. Hence, there is no indication of a corruption, its a silent > corruption. Problem number two - the real one - is, why is > ceph-colume not checking if the OSD that device belongs to is still > up+in? "ceph osd destroy" does that, for example. I believe to > remember that "ceph-volume lvm zap --osd-id" also checks, but I'm > not sure. > > Has this been changed in versions later than octopus? > > I think it is extremely dangerous to provide a tool that allows the > silent corruption of an entire ceph cluster. The corruption is only > discovered on restart and then it would be too late (unless there is > an in-official recovery procedure somewhere). > > I would prefer that ceph-volume lvm zap employs the same strict > sanity checks as other ceph-commands to avoid accidents. In my case > it was a typo, one wrong letter. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx