Re: Ideas for new ceph-mgr service

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry, ignore the part about ceph-mon, I misread that sentence in the original mail.

Matt

----- Original Message -----
> From: "Matt Benjamin" <mbenjamin@xxxxxxxxxx>
> To: "John Spray" <jspray@xxxxxxxxxx>
> Cc: "Ceph Development" <ceph-devel@xxxxxxxxxxxxxxx>
> Sent: Wednesday, January 13, 2016 10:53:50 AM
> Subject: Re: Ideas for new ceph-mgr service
> 
> Hi,
> 
> 
> ----- Original Message -----
> > From: "John Spray" <jspray@xxxxxxxxxx>
> > To: "Ceph Development" <ceph-devel@xxxxxxxxxxxxxxx>
> > Sent: Wednesday, January 13, 2016 8:13:27 AM
> > Subject: Ideas for new ceph-mgr service
> > 
> > Hi all,
> > 
> > We currently have an unfulfilled need for a high level
> > management/monitoring service that can take some of the non-essential
> > tasks away from the mon (like handling the volume of advisory pg
> > stats), and provide a place to implement new features (like
> > cluster-wide command and control of the new scrub stuff in cephfs).
> 
> I (and our group as a whole) think this will be a HUGE win, it's something
> we've talked about conceptually for years.  Thank you sincerely for
> proposing and prototyping this!
> 
> > 
> > We've had a couple of attempts in this area historically:
> >  * ceph-rest-api, which is a stateless HTTP wrapper around the
> > MonCommand interface.  All calls hit the mons directly, it's really
> > just a protocol converter, and it's really more RPC than REST.
> >  * Calamari, which has a very extensible architecture, but suffers
> > from being rather heavyweight, with lots of dependencies like its own
> > database, and requires its own separate agents running on all the Ceph
> > servers.
> > 
> > So, the idea is to create a new lightweight service (ceph-mgr) that
> > runs alongside the mon, and uses the existing Ceph network channels to
> > talk to remote hosts.  The address of this service would be published
> > in the OSDMap, and OSDs and other daemons would send their
> > non-essential stats to the mgr instead of the mon.  For HA we would
> > probably run a mgr alongside each mon, and use whichever mgr instance
> > lived with the current leader mon.
> > 
> > Internally, the mgr itself then has three main components:
> >  * The server (a Messenger), which receives telemetry from daemons
> > elsewhere in the system, and receives cluster map updates from the mon
> >  * A simple in memory store of all the structures that we receive from
> > the cluster (the maps, the daemon metadata, the pg stats)
> >  * An embedded python interpreter that hosts high level functionality
> > like a REST API.
> > 
> > The mgr embodies the interface between "C++ Ceph land" (cephx auth,
> > Messenger, and ::encode/::decode serialization) and "admin land"
> > (JSON-like structures, REST APIs, Python modules).  The reason for
> > doing this in one process, rather than putting the Python parts in a
> > separate service (like calamari) is twofold:
> >  * Code simplicity: avoid inventing a C++->Python network API that
> > re-implements things like cluster map subscription and incremental
> > OSDmaps.
> >  * Efficiency: transmit data in its native encoding, hold it in memory
> > in native structs, and only expose what's needed up into Python-land
> > at runtime.

> 
> I would -strongly- request that we not use a hybrid C++ & Python server as a
> production version of this capability.  If the proof of concept is as
> successful as I intuit, I think it would be highly desirable to design a
> scalable, native-code framework for the core management service runtime.
> 
> Any apparent advantage from flexibility of Cython interfacing is, honestly, I
> think, strongly outweighed by the drawbacks of supporting the hybrid
> interfaces, not to mention the pervasive serialization and latency
> properties of a Python-driven runtime model.  (That's not to say I think
> that Python shouldn't be used to implement routines called from a core
> management runtime, if you strongly prefer not to run such code
> out-of-process [as systems like Salt, iirc, do].)
> 
> Matt
> 
> > 
> > That last part involves a bit of a trick: because Python (specifically
> > the CPython interpreter) is so flexible, we can do neat things like
> > implementing functions in C++ that have access to our native Ceph data
> > structures, but are callable from high level Python code.  We can also
> > cast our C++ structures into Python dicts directly, without an
> > intermediate JSON step, using a magic Formatter subclass that
> > generates python objects instead of serializing.  In general the
> > PyFormatter is still not quite as efficient as writing full blown
> > wrappers for C++ structures, but it's way more efficient that
> > serializing stuff to JSON and sending it over the network.
> > 
> > Most of the business logic would then be written in python.  This
> > would include the obvious status/health REST APIs, but potentially
> > also things like pool management (similar to how the Calamari API
> > handles these).  As well as being accessible via a REST API, the stats
> > that live in the mgr could also be streamed on to a full featured time
> > series database like influxdb, for users that want to deploy that kind
> > of thing.  Our service would store some very recent history, so that
> > folks without a full featured TSDB can still load things like the last
> > 60s of bandwidth into a graph in their GUI, if they have a GUI that
> > uses our API.
> > 
> > I've written a small proof-of-concept service that just subscribes to
> > cluster maps, loads a python module that acts as an HTTP server, and
> > exposes the maps to the module.  It's here:
> > https://github.com/jcsp/ceph/tree/wip-pyfoo/src/pyfoo
> > 
> > I appreciate that this might not all be completely clear in text form,
> > probably some more detailed design and pictures will be needed in due
> > course, but I wanted to put this out there to get feedback.
> > 
> > Cheers,
> > John
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> --
> --
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
> 
> http://www.redhat.com/en/technologies/storage
> 
> tel.  734-707-0660
> fax.  734-769-8938
> cel.  734-216-5309
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-707-0660
fax.  734-769-8938
cel.  734-216-5309
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux