Re: zap an osd and it appears again

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Luis,

Was the osd spec responsible for creating this osd set to unmanaged? Having
it re-pickup available disks is the expected behavior right now (see
https://docs.ceph.com/en/latest/cephadm/services/osd/#declarative-state)
although we've been considering changing this as it seems like in the
majority of cases users want to only pick up the disks available at apply
time and not every matching disk forever. But if you have set the service
to unmanaged and it's still picking up the disks that's a whole different
issue entirely.

Thanks,
  - Adam King

On Tue, Apr 26, 2022 at 8:16 AM Luis Domingues <luis.domingues@xxxxxxxxx>
wrote:

> Hi all,
>
> We got hit by the same bug while doing some testing with cephadm on a test
> cluster.
>
> Version of ceph installed is 16.2.7, we have the orchestrator with cephadm
> but no dashboard.
>
> We tried to remove an osd using ceph orch osd rm 2 --zap.
>
> The osd was drained normally but right after the disk was zapped the
> orchestrator added the disk to the cluster:
>
> 2022-04-26T09:29:31.740508+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5143 :
> cephadm [INF] osd.2 crush weight is 0.4882965087890625
> 2022-04-26T09:29:32.678558+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5145 :
> cephadm [INF] osd.2 weight is now 0.0
> 2022-04-26T09:41:42.479593+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5548 :
> cephadm [INF] osd.2 now down
> 2022-04-26T09:41:42.479832+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5549 :
> cephadm [INF] Removing daemon osd.2 from
> ip-10-12-0-98.eu-central-1.compute.internal
> 2022-04-26T09:41:44.650366+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5551 :
> cephadm [INF] Removing key for osd.2
> 2022-04-26T09:41:44.661482+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5552 :
> cephadm [INF] Successfully removed osd.2 on
> ip-10-12-0-98.eu-central-1.compute.internal
> 2022-04-26T09:41:44.675287+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5553 :
> cephadm [INF] Successfully purged osd.2 on
> ip-10-12-0-98.eu-central-1.compute.internal
> 2022-04-26T09:41:44.675430+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5554 :
> cephadm [INF] Zapping devices for osd.2 on
> ip-10-12-0-98.eu-central-1.compute.internal
> 2022-04-26T09:41:46.629752+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5556 :
> cephadm [INF] Successfully zapped devices for osd.2 on
> ip-10-12-0-98.eu-central-1.compute.internal
> 2022-04-26T09:42:03.331285+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5565 :
> cephadm [INF] Deploying daemon osd.2 on
> ip-10-12-0-98.eu-central-1.compute.internal
>
>
> And we did the same trying --replace on the command, so with this one:
> ceph orch osd rm 2 --replace --zap
>
> Here it is not better as the osd was removed almost instantly juste before
> being re-added. Here is the cephadm log:
>
> 2022-04-26T09:55:21.478379+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5969 :
> cephadm [INF] osd.2 now out
> 2022-04-26T09:55:30.327466+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5982 :
> cephadm [INF] osd.2 now down
> 2022-04-26T09:55:30.327611+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5983 :
> cephadm [INF] Removing daemon osd.2 from
> ip-10-12-0-98.eu-central-1.compute.internal
> 2022-04-26T09:55:33.099252+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5986 :
> cephadm [INF] Removing key for osd.2
> 2022-04-26T09:55:33.117638+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5987 :
> cephadm [INF] Successfully removed osd.2 on
> ip-10-12-0-98.eu-central-1.compute.internal
> 2022-04-26T09:55:33.133074+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5988 :
> cephadm [INF] Successfully destroyed old osd.2 on
> ip-10-12-0-98.eu-central-1.compute.internal; ready for replacement
> 2022-04-26T09:55:33.133133+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5989 :
> cephadm [INF] Zapping devices for osd.2 on
> ip-10-12-0-98.eu-central-1.compute.internal
> 2022-04-26T09:55:35.432259+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5991 :
> cephadm [INF] Successfully zapped devices for osd.2 on
> ip-10-12-0-98.eu-central-1.compute.internal
> 2022-04-26T09:55:35.448361+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5992 :
> cephadm [INF] Found osd claims -> {'ip-10-12-0-98': ['2']}
> 2022-04-26T09:55:35.448466+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 5993 :
> cephadm [INF] Found osd claims for drivegroup default_drives ->
> {'ip-10-12-0-98': ['2']}
> 2022-04-26T09:55:54.573100+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 6004 :
> cephadm [INF] Deploying daemon osd.2 on
> ip-10-12-0-98.eu-central-1.compute.internal
> 2022-04-26T09:56:04.147451+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 6009 :
> cephadm [INF] Detected new or changed devices on
> ip-10-12-0-98.eu-central-1.compute.internal
>
>
> We tried a last time using --force: ceph orch osd rm 2 --replace --zap
> --force. And we still have the same behavior.
>
>
> 2022-04-26T12:00:13.208680+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 9737 :
> cephadm [INF] osd.2 crush weight is 0.4882965087890625
> 2022-04-26T12:00:13.885870+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 9738 :
> cephadm [INF] osd.2 weight is now 0.0
> 2022-04-26T12:12:16.651546+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 10127 :
> cephadm [INF] osd.2 now down
> 2022-04-26T12:12:16.651984+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 10128 :
> cephadm [INF] Removing daemon osd.2 from
> ip-10-12-0-98.eu-central-1.compute.internal
> 2022-04-26T12:12:18.775694+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 10130 :
> cephadm [INF] Removing key for osd.2
> 2022-04-26T12:12:18.785669+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 10131 :
> cephadm [INF] Successfully removed osd.2 on
> ip-10-12-0-98.eu-central-1.compute.internal
> 2022-04-26T12:12:18.799167+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 10132 :
> cephadm [INF] Successfully purged osd.2 on
> ip-10-12-0-98.eu-central-1.compute.internal
> 2022-04-26T12:12:18.799305+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 10133 :
> cephadm [INF] Zapping devices for osd.2 on
> ip-10-12-0-98.eu-central-1.compute.internal
> 2022-04-26T12:12:20.668898+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 10135 :
> cephadm [INF] Successfully zapped devices for osd.2 on
> ip-10-12-0-98.eu-central-1.compute.internal
> 2022-04-26T12:12:37.813769+0000
> mgr.ip-10-12-0-209.eu-central-1.compute.internal.pjsjcm (mgr.14116) 10144 :
> cephadm [INF] Deploying daemon osd.2 on
> ip-10-12-0-98.eu-central-1.compute.internal
>
>
> This is quite bad, because when we want to zap a disk, we want at least
> the orchestrator to let the disk empty a least until it is unplugged.
>
> Luis Domingues
> Proton AG
>
>
> ------- Original Message -------
> On Thursday, March 31st, 2022 at 09:32, Dhairya Parmar <dparmar@xxxxxxxxxx>
> wrote:
>
>
> > Can you try using the --force option with your command?
> >
> > On Thu, Mar 31, 2022 at 1:25 AM Alfredo Rezinovsky alfrenovsky@xxxxxxxxx
> >
> > wrote:
> >
> > > I want to create osds manually
> > >
> > > If I zap the osd 0 with:
> > >
> > > ceph orch osd rm 0 --zap
> > >
> > > as soon as the dev is available the orchestrator creates it again
> > >
> > > If I use:
> > >
> > > ceph orch apply osd --all-available-devices --unmanaged=true
> > >
> > > and then zap the osd.0 it also appears again.
> > >
> > > There is a real way to disable the orch apply persistency or disable it
> > > temporarily?
> > >
> > > --
> > > Alfrenovsky
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux