This is a proof of concept series that formalizes the structure of trace2 event output using JSON-Schema [1]. It provides a validator (written in Go) that verifies the events in a given trace2 event output file match the schema. I am happy to rewrite this validator in some other language, provided that the language has a JSON-Schema library supporting at least draft-04. It runs the validator as part of the CI suite (it increase the runtime by about 15 minutes). It tests that the trace output of "make test" conforms to the schema. Users of the trace2 event output can be relatively confident that the output format has not changed so long as the schema file remains the same and the regression test is passing. I would appreciate any feedback on better ways to integrate the validator into the CI suite. I have not added support for standalone schema validators (as requested in the discussion of V1 of this series) because the few that I tested on my workstation ran for multiple hours (vs. 15 minutes for the validator included in this series). If someone can suggest a performant standalone validator, I will be happy to test that. [1]: https://json-schema.org/ Changes since V2 of this series: * corrected commit message regarding the different schema variations * cleaned up the Makefile * added comment noting that the validator expects JSON-Lines input * added a --progress flag to the validator * improved validation error output Changes since V1 of this series: * dropped the documenation fix, as it can be submitted separately from this series * added JSON-array versions of the schema (currently unused) * added the validation test to the CI suite Josh Steadmon (3): trace2: Add a JSON schema for trace2 events trace2: add a schema validator for trace2 events ci: run trace2 schema validation in the CI suite ci/run-build-and-tests.sh | 6 + t/trace_schema_validator/.gitignore | 1 + t/trace_schema_validator/Makefile | 18 + t/trace_schema_validator/README | 23 + t/trace_schema_validator/event_schema.json | 398 ++++++++++++++ t/trace_schema_validator/list_schema.json | 401 ++++++++++++++ .../strict_list_schema.json | 514 ++++++++++++++++++ t/trace_schema_validator/strict_schema.json | 511 +++++++++++++++++ .../trace_schema_validator.go | 82 +++ 9 files changed, 1954 insertions(+) create mode 100644 t/trace_schema_validator/.gitignore create mode 100644 t/trace_schema_validator/Makefile create mode 100644 t/trace_schema_validator/README create mode 100644 t/trace_schema_validator/event_schema.json create mode 100644 t/trace_schema_validator/list_schema.json create mode 100644 t/trace_schema_validator/strict_list_schema.json create mode 100644 t/trace_schema_validator/strict_schema.json create mode 100644 t/trace_schema_validator/trace_schema_validator.go Range-diff against v2: 1: a949db776c ! 1: d4e82796bc trace2: Add a JSON schema for trace2 events @@ Commit message objects. This can be used to add regression tests to verify that the event output format does not change unexpectedly. - Two versions of the schema are provided: + Four versions of the schema are provided: * event_schema.json is more permissive. It verifies that all expected - fields are present in each trace event, but it allows traces to have + fields are present in a trace event, but it allows traces to have unexpected additional fields. This allows the schema to be specified more concisely by factoring out the common fields into a reusable sub-schema. * strict_schema.json is more restrictive. It verifies that all expected - fields are present and no unexpected fields are present in each trace + fields are present and no unexpected fields are present in the trace event. Due to this additional restriction, the common fields cannot be factored out into a re-usable subschema (at least as-of draft-07) [2], and must be repeated for each event definition. + * list_schema.json is like event_schema.json above, but validates a JSON + array of trace events, rather than a single event. + * strict_list_schema.json is like strict_schema.json above, but + validates a JSON array of trace events, rather than a single event. [1]: https://json-schema.org/ [2]: https://json-schema.org/understanding-json-schema/reference/combining.html#allof 2: 3fa4e9eef8 ! 2: 97cb6a3eb4 trace2: add a schema validator for trace2 events @@ t/trace_schema_validator/.gitignore (new) ## t/trace_schema_validator/Makefile (new) ## @@ ++RM = rm -f ++PROGRAMS = trace_schema_validator ++GOCMD = go ++GOBUILD = $(GOCMD) build ++GOGET = $(GOCMD) get ++ +.PHONY: fetch_deps clean + ++all: $(PROGRAMS) ++ +trace_schema_validator: fetch_deps trace_schema_validator.go -+ go build ++ $(GOBUILD) -o trace_schema_validator + +fetch_deps: -+ go get github.com/xeipuuv/gojsonschema ++ $(GOGET) github.com/xeipuuv/gojsonschema + +clean: -+ rm -f trace_schema_validator ++ $(RM) $(PROGRAMS) ## t/trace_schema_validator/trace_schema_validator.go (new) ## @@ +// trace_schema_validator validates individual lines of an input file against a +// provided JSON-Schema for git trace2 event output. +// ++// Note that this expects each object to validate to be on its own line in the ++// input file (AKA JSON-Lines format). This is what Git natively writes with ++// GIT_TRACE2_EVENT enabled. ++// +// Traces can be collected by setting the GIT_TRACE2_EVENT environment variable +// to an absolute path and running any Git command; traces will be appended to +// the file. +// +// Traces can then be verified like so: +// trace_schema_validator \ -+// --trace2_event_file /path/to/trace/output \ -+// --schema_file /path/to/schema ++// --trace2-event-file /path/to/trace/output \ ++// --schema-file /path/to/schema +package main + +import ( @@ t/trace_schema_validator/trace_schema_validator.go (new) +) + +// Required flags -+var schemaFile = flag.String("schema_file", "", "JSON-Schema filename") -+var trace2EventFile = flag.String("trace2_event_file", "", "trace2 event filename") ++var schemaFile = flag.String("schema-file", "", "JSON-Schema filename") ++var trace2EventFile = flag.String("trace2-event-file", "", "trace2 event filename") ++var progress = flag.Int("progress", 0, "Print progress message each time we have validated this many lines. --progress=0 means no messages are printed") + +func main() { + flag.Parse() + if *schemaFile == "" || *trace2EventFile == "" { -+ log.Fatal("Both --schema_file and --trace2_event_file are required.") ++ log.Fatal("Both --schema-file and --trace2-event-file are required.") + } + schemaURI, err := filepath.Abs(*schemaFile) + if err != nil { @@ t/trace_schema_validator/trace_schema_validator.go (new) + + count := 0 + for ; scanner.Scan(); count++ { -+ if count%10000 == 0 { -+ // Travis-CI expects regular output or it will time out. ++ if *progress != 0 && count%*progress == 0 { + log.Print("Validated items: ", count) + } + event := gojsonschema.NewStringLoader(scanner.Text()) @@ t/trace_schema_validator/trace_schema_validator.go (new) + log.Fatal(err) + } + if !result.Valid() { -+ log.Print("Trace event is invalid: ", scanner.Text()) ++ log.Printf("Trace event line %d is invalid: %s", count+1, scanner.Text()) + for _, desc := range result.Errors() { + log.Print("- ", desc) + } 3: acf3aebcaa ! 3: a07458b2e4 ci: run trace2 schema validation in the CI suite @@ ci/run-build-and-tests.sh: then make test + t/trace_schema_validator/trace_schema_validator \ + --trace2_event_file=${GIT_TRACE2_EVENT} \ -+ --schema_file=t/trace_schema_validator/strict_schema.json ++ --schema_file=t/trace_schema_validator/strict_schema.json \ ++ --progress=10000 fi check_unignored_build_artifacts -- 2.22.0.709.g102302147b-goog