Re: ceph-volume lvm zap destroyes up+in OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Eugen,

can you confirm that the silent corruption happens also on a collocated OSDc (everything on the same device) on pacific? The zap command should simply exit with "osd not down+out" or at least not do anything.

If this accidentally destructive behaviour is still present, I think it is worth a ticket. Since I can't test on versions higher than octopus yet, could you then open the ticket?

Thanks!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: 23 November 2022 09:27:22
To: ceph-users@xxxxxxx
Subject:  Re: ceph-volume lvm zap destroyes up+in OSD

Hi,

I can confirm the behavior for Pacific version 16.2.7. I checked with
a Nautilus test cluster and there it seems to work as expected. I
tried to zap a db device and then restarted one of the OSDs,
successfully. So there seems to be a regression somewhere. I didn't
search for tracker issues yet, but this seems to be worth one, right?

Zitat von Frank Schilder <frans@xxxxxx>:

> Hi all,
>
> on our octopus-latest cluster I accidentally destroyed an up+in OSD
> with the command line
>
>   ceph-volume lvm zap /dev/DEV
>
> It executed the dd command and then failed at the lvm commands with
> "device busy". Problem number one is, that the OSD continued working
> fine. Hence, there is no indication of a corruption, its a silent
> corruption. Problem number two - the real one - is, why is
> ceph-colume not checking if the OSD that device belongs to is still
> up+in? "ceph osd destroy" does that, for example. I believe to
> remember that "ceph-volume lvm zap --osd-id" also checks, but I'm
> not sure.
>
> Has this been changed in versions later than octopus?
>
> I think it is extremely dangerous to provide a tool that allows the
> silent corruption of an entire ceph cluster. The corruption is only
> discovered on restart and then it would be too late (unless there is
> an in-official recovery procedure somewhere).
>
> I would prefer that ceph-volume lvm zap employs the same strict
> sanity checks as other ceph-commands to avoid accidents. In my case
> it was a typo, one wrong letter.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux