Re: Ideas for new ceph-mgr service

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,


----- Original Message -----
> From: "John Spray" <jspray@xxxxxxxxxx>
> To: "Ceph Development" <ceph-devel@xxxxxxxxxxxxxxx>
> Sent: Wednesday, January 13, 2016 8:13:27 AM
> Subject: Ideas for new ceph-mgr service
> 
> Hi all,
> 
> We currently have an unfulfilled need for a high level
> management/monitoring service that can take some of the non-essential
> tasks away from the mon (like handling the volume of advisory pg
> stats), and provide a place to implement new features (like
> cluster-wide command and control of the new scrub stuff in cephfs).

I (and our group as a whole) think this will be a HUGE win, it's something we've talked about conceptually for years.  Thank you sincerely for proposing and prototyping this!

> 
> We've had a couple of attempts in this area historically:
>  * ceph-rest-api, which is a stateless HTTP wrapper around the
> MonCommand interface.  All calls hit the mons directly, it's really
> just a protocol converter, and it's really more RPC than REST.
>  * Calamari, which has a very extensible architecture, but suffers
> from being rather heavyweight, with lots of dependencies like its own
> database, and requires its own separate agents running on all the Ceph
> servers.
> 
> So, the idea is to create a new lightweight service (ceph-mgr) that
> runs alongside the mon, and uses the existing Ceph network channels to
> talk to remote hosts.  The address of this service would be published
> in the OSDMap, and OSDs and other daemons would send their
> non-essential stats to the mgr instead of the mon.  For HA we would
> probably run a mgr alongside each mon, and use whichever mgr instance
> lived with the current leader mon.
> 
> Internally, the mgr itself then has three main components:
>  * The server (a Messenger), which receives telemetry from daemons
> elsewhere in the system, and receives cluster map updates from the mon
>  * A simple in memory store of all the structures that we receive from
> the cluster (the maps, the daemon metadata, the pg stats)
>  * An embedded python interpreter that hosts high level functionality
> like a REST API.
> 
> The mgr embodies the interface between "C++ Ceph land" (cephx auth,
> Messenger, and ::encode/::decode serialization) and "admin land"
> (JSON-like structures, REST APIs, Python modules).  The reason for
> doing this in one process, rather than putting the Python parts in a
> separate service (like calamari) is twofold:
>  * Code simplicity: avoid inventing a C++->Python network API that
> re-implements things like cluster map subscription and incremental
> OSDmaps.
>  * Efficiency: transmit data in its native encoding, hold it in memory
> in native structs, and only expose what's needed up into Python-land
> at runtime.

I defer to your intuition on keeping this localized in the ceph-mon process (there are obviously a ton of reaosns to do this).

I would -strongly- request that we not use a hybrid C++ & Python server as a production version of this capability.  If the proof of concept is as successful as I intuit, I think it would be highly desirable to design a scalable, native-code framework for the core management service runtime.

Any apparent advantage from flexibility of Cython interfacing is, honestly, I think, strongly outweighed by the drawbacks of supporting the hybrid interfaces, not to mention the pervasive serialization and latency properties of a Python-driven runtime model.  (That's not to say I think that Python shouldn't be used to implement routines called from a core management runtime, if you strongly prefer not to run such code out-of-process [as systems like Salt, iirc, do].)

Matt

> 
> That last part involves a bit of a trick: because Python (specifically
> the CPython interpreter) is so flexible, we can do neat things like
> implementing functions in C++ that have access to our native Ceph data
> structures, but are callable from high level Python code.  We can also
> cast our C++ structures into Python dicts directly, without an
> intermediate JSON step, using a magic Formatter subclass that
> generates python objects instead of serializing.  In general the
> PyFormatter is still not quite as efficient as writing full blown
> wrappers for C++ structures, but it's way more efficient that
> serializing stuff to JSON and sending it over the network.
> 
> Most of the business logic would then be written in python.  This
> would include the obvious status/health REST APIs, but potentially
> also things like pool management (similar to how the Calamari API
> handles these).  As well as being accessible via a REST API, the stats
> that live in the mgr could also be streamed on to a full featured time
> series database like influxdb, for users that want to deploy that kind
> of thing.  Our service would store some very recent history, so that
> folks without a full featured TSDB can still load things like the last
> 60s of bandwidth into a graph in their GUI, if they have a GUI that
> uses our API.
> 
> I've written a small proof-of-concept service that just subscribes to
> cluster maps, loads a python module that acts as an HTTP server, and
> exposes the maps to the module.  It's here:
> https://github.com/jcsp/ceph/tree/wip-pyfoo/src/pyfoo
> 
> I appreciate that this might not all be completely clear in text form,
> probably some more detailed design and pictures will be needed in due
> course, but I wanted to put this out there to get feedback.
> 
> Cheers,
> John
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-707-0660
fax.  734-769-8938
cel.  734-216-5309
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux