Derrick Stolee <stolee@xxxxxxxxx> writes: > +'read':: > + > +Read a graph file given by the graph-head file and output basic > +details about the graph file. > ++ > +With `--file=<name>` option, consider the graph stored in the file at > +the path <object-dir>/info/<name>. > + A sample reader confusion after reading the above twice: What is "the graph-head file" and how does the user specify it? Is it given by the value for the "--file=<name>" command line option? Another sample reader reaction after reading the above: What are the kind of "basic details" we can learn from this command is unclear, but perhaps there is an example to help me decide if this command is worth studying. > @@ -44,6 +53,12 @@ EXAMPLES > $ git commit-graph write > ------------------------------------------------ > > +* Read basic information from a graph file. > ++ > +------------------------------------------------ > +$ git commit-graph read --file=<name> > +------------------------------------------------ > + And the sample reader is utterly disappointed at this point. > +static int graph_read(int argc, const char **argv) > +{ > + struct commit_graph *graph = 0; > + struct strbuf full_path = STRBUF_INIT; > + > + static struct option builtin_commit_graph_read_options[] = { > + { OPTION_STRING, 'o', "object-dir", &opts.obj_dir, > + N_("dir"), > + N_("The object directory to store the graph") }, > + { OPTION_STRING, 'H', "file", &opts.graph_file, > + N_("file"), > + N_("The filename for a specific commit graph file in the object directory."), > + PARSE_OPT_OPTARG, NULL, (intptr_t) "" }, > + OPT_END(), > + }; The same comment as all the previous ones apply, wrt short options and non-use of OPT_STRING(). Also, I suspect that these two would want to use OPT_FILENAME instead, if we anticipate that the command might want to be sometimes run from a subdirectory. Otherwise wouldn't cd t && git commit-graph read --file=../.git/object/info/$whatever end up referring to a wrong place because the code that uses the value obtained from OPTION_STRING does not do the equivalent of parse-options.c::fix_filename()? The same applies to object-dir handling. > + argc = parse_options(argc, argv, NULL, > + builtin_commit_graph_read_options, > + builtin_commit_graph_read_usage, 0); > + > + if (!opts.obj_dir) > + opts.obj_dir = get_object_directory(); > + > + if (!opts.graph_file) > + die("no graph hash specified"); > + > + strbuf_addf(&full_path, "%s/info/%s", opts.obj_dir, opts.graph_file); Ahh, I was fooled by a misnamed option. --file does *not* name the file. It is a filename in a fixed place that is determined by other things. So it would be a mistake to use OPT_FILENAME() in the parser for that misnamed "--file" option. The parser for --object-dir still would want to be OPT_FILENAME(), but quite honestly, I do not see the point of having --object-dir option in the first place. The graph file is not relative to it but is forced to have /info/ in between that directory and the filename, so it is not like the user gets useful flexibility out of being able to specify two different places using --object-dir= option and $GIT_OBJECT_DIRECTORY environment (iow, a caller that wants to work on a specific object directory can use the environment, which is how it would tell any other Git subcommand which object store it wants to work with). But stepping back a bit, I think the way --file argument is defined is halfway off from two possible more useful ways to define it. If it were just "path to the file" (iow, what OPT_FILENAME() is suited for parsing it), then a user could say "I have this graph file that I created for testing, it is not installed in its usual place in $GIT_OBJECT_DIRECTORY/info/ at all, but I want you to read it because I am debugging". That is one possible useful extreme. The other possibility would be to allow *only* the hash part to be specified, iow, not just forcing /info/ relative to object directory, you would force the "graph-" prefix and ".graph" suffix. That would be the other extreme that is useful (less typing and less error prone). For a low-level command line this, my gut feeling is that it would be better to allow paths to the object directory and the graph file to be totally independently specified. > + if (graph_signature != GRAPH_SIGNATURE) { > + munmap(graph_map, graph_size); > + close(fd); > + die("graph signature %X does not match signature %X", > + graph_signature, GRAPH_SIGNATURE); > + } > + > + graph_version = *(unsigned char*)(data + 4); > + if (graph_version != GRAPH_VERSION) { > + munmap(graph_map, graph_size); > + close(fd); > + die("graph version %X does not match version %X", > + graph_version, GRAPH_VERSION); > + } > + > + hash_version = *(unsigned char*)(data + 5); > + if (hash_version != GRAPH_OID_VERSION) { > + munmap(graph_map, graph_size); > + close(fd); > + die("hash version %X does not match version %X", > + hash_version, GRAPH_OID_VERSION); It becomes a bit tiring to see munmap/close/die pattern repreated over and over again, doesn't it? Can we make it simpler, perhaps by letting die() take care of the clean-up? After all, if the very next step dies because alloc_commit_graph() got NULL in xmalloc(), we are letting die() there take care of the clean-up anyway already, and die() in the chunk parsing look has no such cleanup, either. Of course, when we later want to libify this part of the code, then we wouldn't be calling die() from this codepath, but the change required to do so will not be just s/die/error/; it would be more like if (x_version != X_VERSION) { error("X version %X does not match",...); goto cleanup_fail; } with munmap/close done at the jumped-to label. > + } > + > + graph = alloc_commit_graph(); > + > + graph->hash_len = GRAPH_OID_LEN; > + ... > + if (chunk_offset > graph_size - GIT_MAX_RAWSZ) > + die("improper chunk offset %08x%08x", (uint32_t)(chunk_offset >> 32), > + (uint32_t)chunk_offset); > + > + switch (chunk_id) { > + case GRAPH_CHUNKID_OIDFANOUT: > + graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset); > + break; This is over-indented from our point of view. In our codebase, case arms aling with switch, i.e. switch (chunk_id) { case GRAPH_CHUNKID_OIDFANOUT: graph->chunk_oid_fanout = ...; break; When the input file has GRAPH_CHUNKID_OIDFANOUT twice, I think it should be flagged as a corrupt/malformed input file, causing the reader to reject it. It is plausible that you wanted to make it "the last one wins", but even if that is the case, I think the user should at least get a warning, as (I'd imagine) it is an unusual condition. The same applies to multiple instances of any currently-defined chunk types. > +graph_read_expect() { > + OPTIONAL="" > + NUM_CHUNKS=3 > + if [ ! -z $2 ] We use "test" and do not use "[ ... ]" or "[[ ... ]]". I'll stop here.