Re: ceph-volume lvm zap destroyes up+in OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks, also for finding the related tracker issue! It looks like a fix has already been approved. Hope it shows up in the next release.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: 28 November 2022 10:58:31
To: Frank Schilder
Cc: ceph-users@xxxxxxx
Subject: Re:  Re: ceph-volume lvm zap destroyes up+in OSD

Hi,

seems like this tracker issue [1] already covers your question. I'll
update the issue and add a link to our thread.

[1] https://tracker.ceph.com/issues/57767


Zitat von Frank Schilder <frans@xxxxxx>:

> Hi Eugen,
>
> can you confirm that the silent corruption happens also on a
> collocated OSDc (everything on the same device) on pacific? The zap
> command should simply exit with "osd not down+out" or at least not
> do anything.
>
> If this accidentally destructive behaviour is still present, I think
> it is worth a ticket. Since I can't test on versions higher than
> octopus yet, could you then open the ticket?
>
> Thanks!
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Eugen Block <eblock@xxxxxx>
> Sent: 23 November 2022 09:27:22
> To: ceph-users@xxxxxxx
> Subject:  Re: ceph-volume lvm zap destroyes up+in OSD
>
> Hi,
>
> I can confirm the behavior for Pacific version 16.2.7. I checked with
> a Nautilus test cluster and there it seems to work as expected. I
> tried to zap a db device and then restarted one of the OSDs,
> successfully. So there seems to be a regression somewhere. I didn't
> search for tracker issues yet, but this seems to be worth one, right?
>
> Zitat von Frank Schilder <frans@xxxxxx>:
>
>> Hi all,
>>
>> on our octopus-latest cluster I accidentally destroyed an up+in OSD
>> with the command line
>>
>>   ceph-volume lvm zap /dev/DEV
>>
>> It executed the dd command and then failed at the lvm commands with
>> "device busy". Problem number one is, that the OSD continued working
>> fine. Hence, there is no indication of a corruption, its a silent
>> corruption. Problem number two - the real one - is, why is
>> ceph-colume not checking if the OSD that device belongs to is still
>> up+in? "ceph osd destroy" does that, for example. I believe to
>> remember that "ceph-volume lvm zap --osd-id" also checks, but I'm
>> not sure.
>>
>> Has this been changed in versions later than octopus?
>>
>> I think it is extremely dangerous to provide a tool that allows the
>> silent corruption of an entire ceph cluster. The corruption is only
>> discovered on restart and then it would be too late (unless there is
>> an in-official recovery procedure somewhere).
>>
>> I would prefer that ceph-volume lvm zap employs the same strict
>> sanity checks as other ceph-commands to avoid accidents. In my case
>> it was a typo, one wrong letter.
>>
>> Best regards,
>> =================
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux