On Wed, Dec 01 2021, Jeff Hostetler wrote: > On 11/12/21 5:33 PM, Ævar Arnfjörð Bjarmason wrote: >> On Thu, Nov 11 2021, Junio C Hamano wrote: >> >>> Josh Steadmon <steadmon@xxxxxxxxxx> writes: >>> >>>> On 2021.11.11 15:03, Junio C Hamano wrote: >>>>> Josh Steadmon <steadmon@xxxxxxxxxx> writes: >>>>> >>>>>> In 64bc752 (trace2: add trace2_child_ready() to report on background >>>>>> children, 2021-09-20), we added a new "child_ready" event. In >>>>>> Documentation/technical/api-trace2.txt, we promise that adding a new >>>>>> event type will result in incrementing the trace2 event format version >>>>>> number, but this was not done. Correct this in code & docs. >>>>>> > ... >> On the field itself I also wonder if it's useful at all. I'd think >> anyone implementing a parser for the format would dispatch to a lookup >> handling known keys, so having a version indicating "new keys here" >> seems rather useless. >> > > That may be true, but it is easier to have a version number and > allow parsers to ignore it, than it is to not have a version number > and at some point in the future require parsers try to figure it > out. IMHO. > > So far we've only added new event types (cmd_ancestry and child_ready) > and everything is in regular JSON forms, so parsing and dispatching > is relatively easy and the version number is not really needed. > But, if in the future we decide to change the contents within one of > those events, then the version number may be significant. Yes, exactly. Versioning it makes sense, I'm commenting specifically on the criteria for bumping the version. I.e. this bit in trace2/tr2_tgt_event.c (with my comments):: * The version should be incremented if new event types are added That seems entirely useless, i.e. just ignore unknown types. If you get an unknown type you know something is new, no need for a version number bump for this case. It adds no new information you're not getting with "oh? a new field?". * if existing fields are removed, [...] Useful, you don't always know if the absence of a field means it won't be there at all, or if something else is up, particularly if the field is optional. E.g. if we rename "error" to "exception" (or whatever) we don't want an older consumer to report that there's no errors happening. * [...]or if there are significant changes in * interpretation of existing events or fields. Carving out "significant changes" seems rather odd. Let's just say "any changes"? E.g. I don't think my 2d3491b117c (tr2: log N parent process names on Linux, 2021-08-27) qualifies as significant change, but why not bump the version even for that? In that case it was part of a release where we bumped the version anyway, but let's come up with some sane criteria. What's "significant" is also entirely subjective. I think 2d3491b117c is trivial in the context of the entirety of our trace2 data. But if all you care about is logging process trees it's highly significant. * Smaller changes, such as adding * a new field to an existing event, do not require an increment to the EVENT * format version. I'd think a new sub-field would be in the same class as a new top-level event field, i.e. adding a version wouldn't add any information. It falls under the same general rule that you can traverse future data with your "version" in hand, and a whitelist of fields you know how to handle. If the version is the same but you've got new unknown fields you shouldn't need to care. IOW I think this would make more sense as a version bumping criteria: The version should be incremented whenever an existing consumer of trace2 data might want to act differently based on the new data. An exception to this is that any new event types do not merit bumping the version number. E.g. we have a top-level event type "error" now, but might hypothetically add a new "warning" type. Such an addition won't require bumping the version. Likewise adding new mandatory fields to existing events doesn't require bumping the version. E.g. the "error" type has (as of writing) a "fmt" and "msg" field. Let's say a future version adds an "id" (as in unique id for the error) field, such an addition won't require bumping the version. In other words, consumers of the trace2 JSON format are expected to walk the structure and only pick those things that they know about. Any unknown fields the consumer doesn't know about can be safely discarded. This won't apply if the version is bumped, then all bets are off, and the meaning of existing fields may or may not have changed. The idea is to encourage additive changes over changes to existing fields, and to reduce the work in maintaining the consumers of the format. As long as consumers ignore new unknown data they won't need to be updated every time the format changes in any way, only for potentially backwards-incompatible changes. Wouldn't this be a saner policy for version bumping? AFAICT the only thing you wouldn't be getting that you're getting now is the trivial optimization of being able to say cheaply route trace2 payloads based on version number alone (but even that is iffy now due to the subjectivity of "significant change").