On Thu, Nov 8, 2018 at 2:04 AM Erwan Velu <evelu@xxxxxxxxxx> wrote: > > Thanks Noah & Zack for your answers. > > In fact what I need here is being able, for a given version, to > anticipate for a given version what will be the json structure of a > given command. > > Let's consider "ceph osd metadata -f". > > When I'm doing this > (https://github.com/ErwanAliasr1/skydive/blob/1fa8c596823bcc53dd7fcecec8c9a529514a2a88/topology/probes/ceph/osd.go#L91), > on a 12.2.5, I get > https://github.com/ErwanAliasr1/skydive/blob/1fa8c596823bcc53dd7fcecec8c9a529514a2a88/topology/probes/ceph/osd.go#L39 > > For OSDs, that's pretty trivial but when you consider running a ceph -s > -f json, I end up with > https://github.com/ErwanAliasr1/skydive/blob/1fa8c596823bcc53dd7fcecec8c9a529514a2a88/topology/probes/ceph/cluster.go#L37 > and that's not still complete as I didn't run every part of ceph > component on this cluster. > > For a 3rd party tool like mine, but surely also for the manager or some > other ones, we should be able to anticipate what will be the expected > scheme for a release. > > If I understood correctly your PR, I'm not sure it does cover the output > of all those commands a user/3rd party can call. You're right that the PR I referenced doesn't exactly solve your issue, but I do think that it may a significant step towards what you are looking for. Indeed, I have a similar (if not exact) challenge with the on-going work of integrating Ceph with Red Hat Insights. Nearly all of the output produced by Ceph CLI commands using `-f json` are driven by a JSON serialization of an internal data structure (or combinations of data structures). The PR I posted takes a bottom-up approach to tackling this higher level goal that you and I have. The PR associates a fully defined JSON schema with each data structure, and adds a unit test that is part of `make test` that automates verification of the schema. It is bottom-up in the sense that these structures are found embedded in the final output of a Ceph CLI `-f json` request. This turns out to be super convenient because the JSON schema standard allows schema composition. For example, if the serialization of a structure is nested in the output of other structures or CLI outputs, then the nested schemas can be re-used. When a CLI command output corresponds exactly to one of these low-level structures (as is sometimes the case now) then things are easy. As John points out, some of the CLI output is built up programmatically. In these cases the output often contains many instances of structures _with_ schemas, but the top-level schema may have an ad-hoc structure. However, all is not lost! I think there is a fairly simple, albeit tedious, way forward: 1) cli schemas Following the bottom-up approach, the next step may be to add schema's for the CLI methods that reuse the low-level structure schemas when possible. 1.a) the ideal option. instead of using a free-form / ad-hoc construction of JSON output for CLI commands, define some new internal data structures that are built-up to contain the final output. This structure is then easy to associate with a serialization method and schema. this is ideal because we can define a covering set of structure instantiations that are much much much much harder to build through putting a cluster into a particular state such that a given output is produced. 1.b) the quicker option (opinions may differ). in principle there is still a deterministic covering set of possible schemas, which means that there is a schema, and it just needs to be teased out by reading through the possible code paths for a particular CLI. 2) versioning In the general case of a mixed version cluster it would seem that either (1) a data source must expose its version or (2) data must be tagged with a version. Based on what John mentioned "the OSDs themselves are passing up a map of strings to strings. Similarly, the servicemap is basically freeform." this level of indirection seems to suggestion that (2) is really the only option. However, this too should be easy. Nearly all of the structures that are serialized to JSON form also have associated serialization methods for dumping out binary encodings---which themselves have access to version information so that data is self identifiable (at least that's my understanding). 3) schema publishing This probably depends heavily on the user of the schema. The simplest use case is verifying with unit tests, for a specific version, if CLI output matches the associated schema. For other programmatic tools, I think that in order to handle hybrid clusters easily, a new meta-level manager (monitor?) command should be created that exposes the covering set of schema's found in the cluster. For other users, schema's could be published along with docs or an *-dev[el] package. Erwan, I realize that's a lot of brain dump there. I think this is a really important topic as Ceph is integrated into more and more places that need machine readable output! - Noah > So I wonder If I'm using the right interface or the right way to collect > information about a ceph cluster. If I want to make a structured > representation of various ceph releases (and containers will generate > this situation), I wonder how to handle that :/ > > Le 07/11/2018 à 21:33, Noah Watkins a écrit : > > Hey Erwan, > > > > This sounds similar to something I started recently, but haven't been > > able to finish completely. Although, it's actually probably pretty > > close to being able to merge. Let me know if it seems like it might > > help out and we can work out what's needed to handle your case.. > > > > https://github.com/ceph/ceph/pull/23716 > > > > - Noah > > On Wed, Nov 7, 2018 at 7:11 AM Erwan Velu <evelu@xxxxxxxxxx> wrote: > >> Hi list, > >> > >> I'm working on a tool that reads the json output of several ceph commands. > >> > >> To ease the parsing, I've been choosing the JSON format which guarantee > >> a parseable output. > >> > >> I'm using the Unmarshall feature of golang to map this output in an > >> internal data structure so every member of this JSON output is easily > >> reachable from the code. > >> > >> That works pretty well except that I have to "anticipate" what will the > >> be members and their types. > >> > >> To do that, I've been transforming the sample output of my Ceph cluster > >> (luminous) into a data struct with https://mholt.github.io/json-to-go/ > >> > >> That works fine unless that I would be a complete output to get every > >> possible combination of the json output. > >> > >> > >> So instead of reversing the json output, and that per version of Ceph as > >> versions can change the format, how can I extract the complete JSON > >> schema from each command ? > >> > >> > >> Erwan, > >>