"Abhishek Kumar via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > From: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx> > > Both fill_commit_graph_info() and fill_commit_in_graph() parse > information present in commit data chunk. Let's simplify the > implementation by calling fill_commit_graph_info() within > fill_commit_in_graph(). > > The test 'generate tar with future mtime' creates a commit with commit > time of (2 ^ 36 + 1) seconds since EPOCH. The commit time overflows into > generation number (within CDAT chunk) and has undefined behavior. > > The test used to pass Could you please tell us how does this test starts to fail without the change to the test described there? What is the error message, etc.? > as fill_commit_in_graph() guarantees the values of ^^^^^^^^^^^^^^^^^^^^^^ s/fill_commit_in_graph()/fill_commit_graph_info()/ It is fill_commit_graph_info() that changes its behavior in this patch. > graph position and generation number, and did not load timestamp. > However, with corrected commit date we will need load the timestamp as > well to populate the generation number. > > Let's fix the test by setting a timestamp of (2 ^ 34 - 1) seconds. I think this commit should be split into two commits: - fix to the 'generate tar with future mtime' test - simplify implementation of fill_commit_in_graph() The test 'generate tar with future mtime' in t/t5000-tar-tree.sh creates a commit with commit time of (2 ^ 36 - 1) seconds since EPOCH (68719476737). However, the commit-graph file format version 1 provides only 34-bits for storing committer date (32 + 2 bits), not 64-bits. Therefore maximum far in the future commit time can only be at most (2 ^ 34 - 1) seconds since EPOCH, as Stolee said in commet for v1 of this series. This "limitation" is not a problem in practice, because the maximum timestamp allowed takes place in the year 2514. I hope at that time there would be no Git version in use that still crashes on changing the version field in the commit-graph format -- then we can simply get rid of storing topological levels (generation number v1) in those 30 bits of CDAT chunk and use full 64 bits for committer date. Git does not perform any bounds checking for committer date value in write_graph_chunk_data(): uint32_t packedDate[2]; /* ... */ if (sizeof((*list)->date) > 4) packedDate[0] = htonl(((*list)->date >> 32) & 0x3); else packedDate[0] = 0; packedDate[0] |= htonl(commit_graph_data_at(*list)->generation << 2); packedDate[1] = htonl((*list)->date); hashwrite(f, packedDate, 8); This means that the date is trimmed to 34 bits on save discarding most significant bits, assuming that unsigned overflow simply discards most significant bits truncating the signed (?) value. In this case running the test with GIT_TEST_COMMIT_GRAPH=1 would lead to errors, as the committer date read from the commit graph would be incorrect, and therefore generation number v2 would also be incorrect. I don't quite understand however how second part of this patch in its current iteration, namely simplifing the implementation of fill_commit_in_graph() makes this bug / error shows... Do I understand it correctly that before this change the committer date would always be parsed out of the commit object, instead of reading it from the commit-graph file? However the only user of static fill_commit_in_graph() is the parse_commit_in_graph(), which in turn is used by parse_commit_gently(); but fill_commit_in_graph() read commit date from commit-graph before this change... color me confused. Ah, after the change fill_commit_graph_info() changes its behavior, not fill_commit_in_graph() as said in the commit message. Before this commit it used to only load graph position and generation number, and did not load the timestamp. The function fill_commit_graph_info() is used in turn by public-facing load_commit_graph_info(): /* * It is possible that we loaded commit contents from the commit buffer, * but we also want to ensure the commit-graph content is correctly * checked and filled. Fill the graph_pos and generation members of * the given commit. */ void load_commit_graph_info(struct repository *r, struct commit *item); This function is used in turn by get_bloom_filter(), contains_tag_algo() and parse_commit_buffer(), change in any of which behavior can lead to failing 'generate tar with future mtime' test. > > Signed-off-by: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx> > --- > commit-graph.c | 29 +++++++++++------------------ > t/t5000-tar-tree.sh | 4 ++-- > 2 files changed, 13 insertions(+), 20 deletions(-) > > diff --git a/commit-graph.c b/commit-graph.c > index ace7400a1a..af8d9cc45e 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -725,15 +725,24 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, > const unsigned char *commit_data; > struct commit_graph_data *graph_data; > uint32_t lex_index; > + uint64_t date_high, date_low; > > while (pos < g->num_commits_in_base) > g = g->base_graph; > > + if (pos >= g->num_commits + g->num_commits_in_base) > + die(_("invalid commit position. commit-graph is likely corrupt")); > + All right, if we want to use fill_commit_graph_info() function to load the graph data (graph position and generation number) in the fill_commit_in_graph() we need to perform this check. I'd think that this check should be here from the beginning, just in case. Sidenote: I wonder if it would be good idea to print more information in the above error message, for example: die(_("invalid commit position %ld. commit-graph '%s' is likely corrupt"), pos, g->filename); But this is unrelated thing, tangential to this change, and it might not add anything useful. > lex_index = pos - g->num_commits_in_base; > commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index; > > graph_data = commit_graph_data_at(item); > graph_data->graph_pos = pos; > + > + date_high = get_be32(commit_data + g->hash_len + 8) & 0x3; > + date_low = get_be32(commit_data + g->hash_len + 12); > + item->date = (timestamp_t)((date_high << 32) | date_low); > + I think this change, moving loading of commit date from commit-graph out of fill_commit_in_graph() and into fill_commit_graph_info(), is in my opinion a bit inadequatly described in the commit message. As I understand it this change prepares fill_commit_graph_info() for generation number v2, that is Corrected Commit Date, where loading commit date from CDAT together with loading offset from GDAT would be necessary to correctly set the 'generation' field of 'struct commit_graph_data' (on the commit_graph_data_slab). I'm not sure if it would be worth it splitting this refactoring change (Move Statements into Function) into a separate patch -- it would split this commit into three, changing 11 part series into 13 part series. Note that we might want to update the description of load_commit_graph_info() in commit-graph.h to include that it incidentally loads commit date from the commit-graph. Butthis might be not worth it -- it is a side effect, not the major goal of this function. > graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; > } > > @@ -748,38 +757,22 @@ static int fill_commit_in_graph(struct repository *r, > { > uint32_t edge_value; > uint32_t *parent_data_ptr; > - uint64_t date_low, date_high; > struct commit_list **pptr; > - struct commit_graph_data *graph_data; > const unsigned char *commit_data; > uint32_t lex_index; > > while (pos < g->num_commits_in_base) > g = g->base_graph; > > - if (pos >= g->num_commits + g->num_commits_in_base) > - die(_("invalid commit position. commit-graph is likely corrupt")); > + fill_commit_graph_info(item, g, pos); All right, the check got moved into fill_commit_graph_info(). > > - /* > - * Store the "full" position, but then use the > - * "local" position for the rest of the calculation. > - */ > - graph_data = commit_graph_data_at(item); > - graph_data->graph_pos = pos; All right, 'graph_pos' field in the graph data (on commit slab) got filled by just called load_commit_graph_info(). > lex_index = pos - g->num_commits_in_base; > - > - commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index; > + commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index; All right, unrelated cleanup in the neighbourhood. > > item->object.parsed = 1; > > set_commit_tree(item, NULL); > > - date_high = get_be32(commit_data + g->hash_len + 8) & 0x3; > - date_low = get_be32(commit_data + g->hash_len + 12); > - item->date = (timestamp_t)((date_high << 32) | date_low); > - All right, this code got moved down the call chain into just called load_commit_graph_info(). > - graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; > - All right, 'generation' field in the graph data (on commit slab) got filled by load_commit_graph_info() called at the beginning of the function. > pptr = &item->parents; > > edge_value = get_be32(commit_data + g->hash_len); > diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh > index 37655a237c..1986354fc3 100755 > --- a/t/t5000-tar-tree.sh > +++ b/t/t5000-tar-tree.sh > @@ -406,7 +406,7 @@ test_expect_success TIME_IS_64BIT 'set up repository with far-future commit' ' > rm -f .git/index && > echo content >file && > git add file && > - GIT_COMMITTER_DATE="@68719476737 +0000" \ > + GIT_COMMITTER_DATE="@17179869183 +0000" \ > git commit -m "tempori parendum" > ' > > @@ -415,7 +415,7 @@ test_expect_success TIME_IS_64BIT 'generate tar with future mtime' ' > ' > > test_expect_success TAR_HUGE,TIME_IS_64BIT,TIME_T_IS_64BIT 'system tar can read our future mtime' ' > - echo 4147 >expect && > + echo 2514 >expect && > tar_info future.tar | cut -d" " -f2 >actual && > test_cmp expect actual > ' Looks good to me. Best, -- Jakub Narębski