Sorry, ignore the part about ceph-mon, I misread that sentence in the original mail. Matt ----- Original Message ----- > From: "Matt Benjamin" <mbenjamin@xxxxxxxxxx> > To: "John Spray" <jspray@xxxxxxxxxx> > Cc: "Ceph Development" <ceph-devel@xxxxxxxxxxxxxxx> > Sent: Wednesday, January 13, 2016 10:53:50 AM > Subject: Re: Ideas for new ceph-mgr service > > Hi, > > > ----- Original Message ----- > > From: "John Spray" <jspray@xxxxxxxxxx> > > To: "Ceph Development" <ceph-devel@xxxxxxxxxxxxxxx> > > Sent: Wednesday, January 13, 2016 8:13:27 AM > > Subject: Ideas for new ceph-mgr service > > > > Hi all, > > > > We currently have an unfulfilled need for a high level > > management/monitoring service that can take some of the non-essential > > tasks away from the mon (like handling the volume of advisory pg > > stats), and provide a place to implement new features (like > > cluster-wide command and control of the new scrub stuff in cephfs). > > I (and our group as a whole) think this will be a HUGE win, it's something > we've talked about conceptually for years. Thank you sincerely for > proposing and prototyping this! > > > > > We've had a couple of attempts in this area historically: > > * ceph-rest-api, which is a stateless HTTP wrapper around the > > MonCommand interface. All calls hit the mons directly, it's really > > just a protocol converter, and it's really more RPC than REST. > > * Calamari, which has a very extensible architecture, but suffers > > from being rather heavyweight, with lots of dependencies like its own > > database, and requires its own separate agents running on all the Ceph > > servers. > > > > So, the idea is to create a new lightweight service (ceph-mgr) that > > runs alongside the mon, and uses the existing Ceph network channels to > > talk to remote hosts. The address of this service would be published > > in the OSDMap, and OSDs and other daemons would send their > > non-essential stats to the mgr instead of the mon. For HA we would > > probably run a mgr alongside each mon, and use whichever mgr instance > > lived with the current leader mon. > > > > Internally, the mgr itself then has three main components: > > * The server (a Messenger), which receives telemetry from daemons > > elsewhere in the system, and receives cluster map updates from the mon > > * A simple in memory store of all the structures that we receive from > > the cluster (the maps, the daemon metadata, the pg stats) > > * An embedded python interpreter that hosts high level functionality > > like a REST API. > > > > The mgr embodies the interface between "C++ Ceph land" (cephx auth, > > Messenger, and ::encode/::decode serialization) and "admin land" > > (JSON-like structures, REST APIs, Python modules). The reason for > > doing this in one process, rather than putting the Python parts in a > > separate service (like calamari) is twofold: > > * Code simplicity: avoid inventing a C++->Python network API that > > re-implements things like cluster map subscription and incremental > > OSDmaps. > > * Efficiency: transmit data in its native encoding, hold it in memory > > in native structs, and only expose what's needed up into Python-land > > at runtime. > > I would -strongly- request that we not use a hybrid C++ & Python server as a > production version of this capability. If the proof of concept is as > successful as I intuit, I think it would be highly desirable to design a > scalable, native-code framework for the core management service runtime. > > Any apparent advantage from flexibility of Cython interfacing is, honestly, I > think, strongly outweighed by the drawbacks of supporting the hybrid > interfaces, not to mention the pervasive serialization and latency > properties of a Python-driven runtime model. (That's not to say I think > that Python shouldn't be used to implement routines called from a core > management runtime, if you strongly prefer not to run such code > out-of-process [as systems like Salt, iirc, do].) > > Matt > > > > > That last part involves a bit of a trick: because Python (specifically > > the CPython interpreter) is so flexible, we can do neat things like > > implementing functions in C++ that have access to our native Ceph data > > structures, but are callable from high level Python code. We can also > > cast our C++ structures into Python dicts directly, without an > > intermediate JSON step, using a magic Formatter subclass that > > generates python objects instead of serializing. In general the > > PyFormatter is still not quite as efficient as writing full blown > > wrappers for C++ structures, but it's way more efficient that > > serializing stuff to JSON and sending it over the network. > > > > Most of the business logic would then be written in python. This > > would include the obvious status/health REST APIs, but potentially > > also things like pool management (similar to how the Calamari API > > handles these). As well as being accessible via a REST API, the stats > > that live in the mgr could also be streamed on to a full featured time > > series database like influxdb, for users that want to deploy that kind > > of thing. Our service would store some very recent history, so that > > folks without a full featured TSDB can still load things like the last > > 60s of bandwidth into a graph in their GUI, if they have a GUI that > > uses our API. > > > > I've written a small proof-of-concept service that just subscribes to > > cluster maps, loads a python module that acts as an HTTP server, and > > exposes the maps to the module. It's here: > > https://github.com/jcsp/ceph/tree/wip-pyfoo/src/pyfoo > > > > I appreciate that this might not all be completely clear in text form, > > probably some more detailed design and pictures will be needed in due > > course, but I wanted to put this out there to get feedback. > > > > Cheers, > > John > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > -- > Matt Benjamin > Red Hat, Inc. > 315 West Huron Street, Suite 140A > Ann Arbor, Michigan 48103 > > http://www.redhat.com/en/technologies/storage > > tel. 734-707-0660 > fax. 734-769-8938 > cel. 734-216-5309 > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- -- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-707-0660 fax. 734-769-8938 cel. 734-216-5309 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html