On Wed, Oct 24, 2018 at 6:46 PM Lars Marowsky-Bree <lmb@xxxxxxxx> wrote: > > On 2018-10-24T12:37:50, Sebastian Wagner <swagner@xxxxxxx> wrote: > > > >> 3) Delegate this to the new orchestrator. Kube can just run this command > > >> wherever we want. Ansible presumably can too. > > > > > > I agree with Kai and Jan that this is the way to go. > > > > > > Using the libstoragemgmt network service is still possible, it would > > > just be an implementation detail for the orchestrator itself. I can > > > imagine that in some future container environments, deploying > > > something like the libstoragemgmt network service becomes quite > > > cheap/easy, and saves the effort of tools like Rook implementing their > > > own agent hooks -- but Ceph won't care. > > > > > > This would be an area where we need to get the orchestrator's device > > > names in line with Ceph's internal device naming -- that would be > > > useful anyway for other orchestrator functionality. > > > > Yes. As the OSD may no longer be running, and thus out of reach for > > Ceph, we should to use the orchestrator for that. > > I feel stupid for agreeing with this yet again, but also exactly because > in a containerized world (where maybe only the specific LV is exposed to > the pod) ceph-osd might not be able to even. > > And finally, we may want to be able to blink which disk we are *about to > provision*, or which slot to plug the disk into, so in ceph-volume setup > stages. Anything that relies on the ceph-osd process is a bit flawed. > > And the orchestrator has all the access privileges because it needs them > anyway. > > > > > I'd be inclined to just make the command synchronous, and return an > > > error if the host is unreachable (perhaps with a special force flag to > > > clear out Ceph's state if the host is gone and never coming back). > > Actually this should probably be build with the extension to the whole > node in mind! In case of a total node failure in a 500 node DC, this > might need to be blinked just as well. Do you mean blinking the node when it's running but none of its OSDs are, or blinking a fully offline node using lights-off management layers? John > > > > > > Regards, > Lars > > -- > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) > "Architects should open possibilities and not determine everything." (Ueli Zbinden)