On 1/27/2019 8:28 AM, SZEDER Gábor wrote: > On Sun, Jan 27, 2019 at 02:08:32PM +0100, SZEDER Gábor wrote: >> When the commit graph and generation numbers were introduced in >> commits 177722b344 (commit: integrate commit graph with commit >> parsing, 2018-04-10) and 83073cc994 (commit: add generation number to >> struct commit, 2018-04-25), they tried to make sure that the >> corresponding 'graph_pos' and 'generation' fields of 'struct commit' >> are initialized conservatively, as if the commit were not included in >> the commit-graph file. >> >> Alas, initializing those fields only in alloc_commit_node() missed the >> case when an object that happens to be a commit is first looked up via >> lookup_unknown_object(), and is then later converted to a 'struct >> commit' via the object_as_type() helper function (either calling it >> directly, or as part of a subsequent lookup_commit() call). >> Consequently, both of those fields incorrectly remain set to zero, >> which means e.g. that the commit is present in and is the first entry >> of the commit-graph file. This will result in wrong timestamp, parent >> and root tree hashes, if such a 'struct commit' instance is later >> filled from the commit-graph. >> >> Extract the initialization of 'struct commit's fields from >> alloc_commit_node() into a helper function, and call it from >> object_as_type() as well, to make sure that it properly initializes >> the two commit-graph-related fields, too. With this helper function >> it is hopefully less likely that any new fields added to 'struct >> commit' in the future would remain uninitialized. >> >> With this change alloc_commit_index() won't have any remaining callers >> outside of 'alloc.c', so mark it as static. >> >> Signed-off-by: SZEDER Gábor <szeder.dev@xxxxxxxxx> >> --- >> >> So, it turns out that ec0c5798ee (revision: use commit graph in >> get_reference(), 2018-12-04) is not the culprit after all, it merely >> highlighted a bug that is as old as the commit-graph feature itself. >> This patch fixes this and all other related issues I reported >> upthread. > > And how/why does this affect 'git describe --dirty'? > > - 'git describe' first iterates over all refs, and somewhere deep > inside for_each_ref() each commit (well, object) a ref points to > is looked up via lookup_unknown_object(). This leaves all fields > of the created object zero initialized. > > - Then it dereferences HEAD for '--dirty' and ec0c5798ee's changes > to get_reference() kick in: lookup_commit() doesn't instantiate a > brand new and freshly initialized 'struct commit', but returns the > object created in the previous step converted into 'struct > commit'. This conversion doesn't set the commit-graph fields in > 'struct commit', but leaves both as zero. get_reference() then > tries to load HEAD's commit information from the commit-graph, > find_commit_in_graph() sees the the still zero 'graph_pos' field > and doesn't perform a search through the commit-graph file, and > the subsequent fill_commit_in_graph() reads the commit info from > the first entry. > > In case of the failing test I posted earlier, where only the first > commit is in the commit-graph but HEAD isn't, this means that the > HEAD's 'struct commit' is filled with the info of HEAD^. > > - Ultimately, the diff machinery then doesn't compare the worktree > to HEAD's tree, but to HEAD^'s, finds that they differ, hence the > incorrect '-dirty' flag in the output. > > Before ec0c5798ee get_reference() simply called parse_object(), which > ignored the commit-graph, so the issue could remain hidden. Thanks for digging in, Szeder. This is a very subtle interaction, and I'm glad you caught the issue. There are likely other ways this could become problematic, including hitting BUG() statements regarding generation numbers. I recommend this be merged to 'maint' if possible. Thanks, -Stolee