Re: Extra daemons/servers reporting to mgr

Sage Weil <sweil@xxxxxxxxxx> · Tue, 20 Jun 2017 21:00:40 +0000 (UTC)

On Tue, 20 Jun 2017, Gregory Farnum wrote:
> On Mon, Jun 19, 2017 at 12:26 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> > I wrote up a quick proposal at
> >
> >         http://pad.ceph.com/p/service-map
> >
> > Basic idea:
> >
> >  - generic ServiceMap of service -> daemon -> metadata and status
> >  - managed/persisted by mon
> >  - librados interface to register as service X name Y (e.g., 'rgw.foo')
> >  - librados will send regular beacon to mon to keep entry alive
> >  - various mon commands to dump all or part of the service map
> 
> I am deeply uncomfortable with putting this stuff into the monitor (at
> least, directly). The main purpose we've discussed is to enable
> manager dashboard display of these services, along with stats
> collection, and there's no reason for that to go anywhere other than
> the manager — in fact, routing it through the monitor is inimical to
> timely updates of statistics. Why do you want to do that instead of
> letting it be handled by the manager, which can aggregate and persist
> whatever data it likes in a convenient form — and in ways which are
> mindful of monitor IO abilities?

Well, I argued for doing this in the mon this morning but after 
implementing the first half of it I'm thinking the mgr makes more sense.  
I wanted to use the mon makes sense because 

- it's a persistent structure that should remain consistent across mgr 
restarts etc,
- it looks just like OSDMap and FSMap, just a bit more freeform. those are 
in the mon.
- if it's stored on the mon, there's no particular reason the mgr needs to 
be involved at all

The main complaint was around the 'status' map which may update 
semi-frequently; does that need to be persisted?  (I'd argue that most 
things that change very frequently are probably best covered by 
perfcounters or something other than this globally visible service map.  
But some ad hoc status information is definitely useful, so...)

But... after writing a ServiceMap and ServiceMonitor skeleton it's time to 
implemetn beacon, and I'd prefer to do that using MMonCommand to (1) make 
it usable and testable via the cli (i.e., a well-written bash script could 
be a service if it wanted to), and (2) avoid writing new messages that 
aren't really needed.  And new commands can be trivially implemented on 
the mgr.  In python.

Also, the get_health etc hooks in ServiceMonitor made me think we will 
want some per-service logic around this stuff.  Like, issue a health 
warning if < my target 5 radosgws are running.  Writing per-service 
pluggable logic is also a good fit for ceph-mgr.

Also, the contents of ServiceMap can just be a section of config-key and 
trivially visible to all, without any special code.  This also seems 
convenient (albeit more fragile).

If it goes in mgr, though, I assume we'll have a split between what is 
persisted (in config-key or elsewhere) and what is ephemeral status 
information.  I expect this whole thing is easiest to implement as a 
mgr_module, but I'm not sure we have a way to share unpersisted state 
between modules?  Perhaps a config-key-like interface but local only to 
the mgr instance is all we need there.

sage