Re: cephadm next steps

Sebastian Wagner <swagner@xxxxxxxx> · Tue, 11 Feb 2020 17:47:49 +0100

Am 08.02.20 um 17:25 schrieb Sage Weil:

> The serve() one is the most important, IMO: we need it to (1) be parallel, 
> (2) gracefully handle errors for each host and raise appropriate health 
> alerts, and (3) update the cache as appropriate.  For the CLI case, 
> whether it triggers the scrape synchrnously or somehow kicks serve() and 
> waits is an probably-not-so-important detail.

The only concern I have is: We have to prevent to scrape in parallel: in
serve() and from the cli. We simply don't have enough connections to
spare. I've seen this for other calls as well: if serve() is busy doing
some background task, the cli basically hangs.

> 
> On the other hand, the remaining internal _get_services() callers should I 
> think all just use the latest cached state.

+1

> Right now the way the code is 
> structured makes it very confusing which path is used for which, and the 
> use of the async_map_completion help (currently, at least) makes it hard 
> to tell which host failed.

The exception (with indeed very little detail) should be forwarded to
the completion.

> 
> As for additional services (monitoring, nfs, etc.), I think that can 
> proceed more quickly once we have the CLI and add/remove/update issues 
> sorted out.  I may start with a RFC PR on that, but I would really 
> like some feedback on whether the proposal makes sense.

https://github.com/ceph/ceph/pull/33205/files should also help with new
services.

> 
> Thanks!
> sage
> 

-- 
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer

Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx