On 7/28/2020 5:13 AM, Abhishek Kumar via GitGitGadget wrote: > From: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx> > > Both fill_commit_graph_info() and fill_commit_in_graph() parse > information present in commit data chunk. Let's simplify the > implementation by calling fill_commit_graph_info() within > fill_commit_in_graph(). > > The test 'generate tar with future mtime' creates a commit with commit > time of (2 ^ 36 + 1) seconds since EPOCH. The commit time overflows into > generation number and has undefined behavior. The test used to pass as > fill_commit_in_graph() did not read commit time from commit graph, > reading commit date from odb instead. I was first confused as to why fill_commit_graph_info() did not load the timestamp, but the reason is that it is only used by two methods: 1. fill_commit_in_graph(): this actually leaves the commit in a "parsed" state, so the date must be correct. Thus, it parses the date out of the commit-graph. 2. load_commit_graph_info(): this only helps to guarantee we know the graph_pos and generation number values. Perhaps add this extra context: you will _need_ the commit date from the commit-graph in order to populate the generation number v2 in fill_commit_graph_info(). > Let's fix that by setting commit time of (2 ^ 34 - 1) seconds. The timestamp limit placed in the commit-graph is more restrictive than 64-bit timestamps, but as your test points out, the maximum timestamp allowed takes place in the year 2514. That is far enough away for all real data. > Signed-off-by: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx> > --- > commit-graph.c | 31 ++++++++++++------------------- > t/t5000-tar-tree.sh | 4 ++-- > 2 files changed, 14 insertions(+), 21 deletions(-) > > diff --git a/commit-graph.c b/commit-graph.c > index 5d3c9bd23c..204eb454b2 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -735,15 +735,24 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, > const unsigned char *commit_data; > struct commit_graph_data *graph_data; > uint32_t lex_index; > + uint64_t date_high, date_low; > > while (pos < g->num_commits_in_base) > g = g->base_graph; > > + if (pos >= g->num_commits + g->num_commits_in_base) > + die(_("invalid commit position. commit-graph is likely corrupt")); > + > lex_index = pos - g->num_commits_in_base; > commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index; > > graph_data = commit_graph_data_at(item); > graph_data->graph_pos = pos; > + > + date_high = get_be32(commit_data + g->hash_len + 8) & 0x3; > + date_low = get_be32(commit_data + g->hash_len + 12); > + item->date = (timestamp_t)((date_high << 32) | date_low); > + > graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; > } > > @@ -758,38 +767,22 @@ static int fill_commit_in_graph(struct repository *r, > { > uint32_t edge_value; > uint32_t *parent_data_ptr; > - uint64_t date_low, date_high; > struct commit_list **pptr; > - struct commit_graph_data *graph_data; > const unsigned char *commit_data; > uint32_t lex_index; > > + fill_commit_graph_info(item, g, pos); > + > while (pos < g->num_commits_in_base) > g = g->base_graph; This 'while' loop happens in both implementations, so you could save a miniscule amount of time by placing the call to fill_commit_graph_info() after the while loop. > - if (pos >= g->num_commits + g->num_commits_in_base) > - die(_("invalid commit position. commit-graph is likely corrupt")); > - /* > - * Store the "full" position, but then use the > - * "local" position for the rest of the calculation. > - */ > - graph_data = commit_graph_data_at(item); > - graph_data->graph_pos = pos; > lex_index = pos - g->num_commits_in_base; > - > - commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index; > + commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index; I was about to complain about this change, but GRAPH_DATA_WIDTH is a macro that does an equivalent thing (except the_hash_algo->rawsz instead of g->hash_len). > > item->object.parsed = 1; > > set_commit_tree(item, NULL); > > - date_high = get_be32(commit_data + g->hash_len + 8) & 0x3; > - date_low = get_be32(commit_data + g->hash_len + 12); > - item->date = (timestamp_t)((date_high << 32) | date_low); > - > - graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; > - > pptr = &item->parents; > > edge_value = get_be32(commit_data + g->hash_len); > diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh > index 37655a237c..1986354fc3 100755 > --- a/t/t5000-tar-tree.sh > +++ b/t/t5000-tar-tree.sh > @@ -406,7 +406,7 @@ test_expect_success TIME_IS_64BIT 'set up repository with far-future commit' ' > rm -f .git/index && > echo content >file && > git add file && > - GIT_COMMITTER_DATE="@68719476737 +0000" \ > + GIT_COMMITTER_DATE="@17179869183 +0000" \ > git commit -m "tempori parendum" > ' > > @@ -415,7 +415,7 @@ test_expect_success TIME_IS_64BIT 'generate tar with future mtime' ' > ' > > test_expect_success TAR_HUGE,TIME_IS_64BIT,TIME_T_IS_64BIT 'system tar can read our future mtime' ' > - echo 4147 >expect && > + echo 2514 >expect && > tar_info future.tar | cut -d" " -f2 >actual && > test_cmp expect actual > ' > Thanks, -Stolee