Re: [PATCH v3 06/11] commit-graph: add a slab to store topological levels

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 25 Aug 2020 at 09:33, Jakub Narębski <jnareb@xxxxxxxxx> wrote:
> Abhishek Kumar <abhishekkumar8222@xxxxxxxxx> writes:
> > On Fri, Aug 21, 2020 at 08:43:38PM +0200, Jakub Narębski wrote:
> >> "Abhishek Kumar via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
> >>
> >>> From: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx>
> [...]
> >>> @@ -1335,11 +1341,11 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> >>>                                     _("Computing commit graph generation numbers"),
> >>>                                     ctx->commits.nr);
> >>>     for (i = 0; i < ctx->commits.nr; i++) {
> >>> -           uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation;
> >>> +           uint32_t level = *topo_level_slab_at(ctx->topo_levels, ctx->commits.list[i]);
> >>
> >> All right, so that is why this 'generation' variable was not converted
> >> to timestamp_t type.
> >>
> >>>
> >>>             display_progress(ctx->progress, i + 1);
> >>> -           if (generation != GENERATION_NUMBER_V1_INFINITY &&
> >>> -               generation != GENERATION_NUMBER_ZERO)
> >>> +           if (level != GENERATION_NUMBER_V1_INFINITY &&
> >>> +               level != GENERATION_NUMBER_ZERO)
> >>>                     continue;
> >>
> >> Here we use GENERATION_NUMBER*_INFINITY to check if the commit is
> >> outside commit-graph files, and therefore we would need its topological
> >> level computed.
> >>
> >> However, I don't understand how it works.  We have had created the
> >> commit_graph_data_at() and use it instead of commit_graph_data_slab_at()
> >> to provide default values for `struct commit_graph`... but only for
> >> `graph_pos` member.  It is commit_graph_generation() that returns
> >> GENERATION_NUMBER_INFINITY for commits not in graph.
> >>
> >> But neither commit_graph_data_at()->generation nor topo_level_slab_at()
> >> handles this special case, so I don't see how 'generation' variable can
> >> *ever* be GENERATION_NUMBER_INFINITY, and 'level' variable can ever be
> >> GENERATION_NUMBER_V1_INFINITY for commits not in graph.
> >>
> >> Does it work *accidentally*, because the default value for uninitialized
> >> data on commit-slab is 0, which matches GENERATION_NUMBER_ZERO?  It
> >> certainly looks like it does.  And GENERATION_NUMBER_ZERO is an artifact
> >> of commit-graph feature development history, namely the short time where
> >> Git didn't use any generation numbers and stored 0 in the place set for
> >> it in the commit-graph format...  On the other hand this is not the case
> >> for corrected commit date (generation number v2), as it could
> >> "legitimately" be 0 if some root commit (without any parents) had
> >> committerdate of epoch 0, i.e. 1 January 1970 00:00:00 UTC, perhaps
> >> caused by malformed but valid commit object.
> >>
> >> Ugh...
> >
> > It works accidentally.
> >
> > Our decision to avoid the cost of initializing both
> > commit_graph_data->generation and commit_graph_data->graph_pos has
> > led to some unwieldy choices - the complexity of helper functions,
> > bypassing helper functions when writing a commit-graph file [1].
> >
> > I want to re-visit how commit_graph_data slab is defined in a future series.
> >
> > [1]: https://lore.kernel.org/git/be28ab7b-0ae4-2cc5-7f2b-92075de3723a@xxxxxxxxx/
>
> All right, we might want to make use of the fact that the value of 0 for
> topological level here always mean that its value for a commit needs to
> be computed, that 0 is not a valid value for topological levels.
> - if the value 0 came from commit-graph file, it means that it came
>   from Git version that used commit-graph but didn't compute generation
>   numbers; the value is GENERATION_NUMBER_ZERO
> - the value 0 might came from the fact that commit is not in graph,
>   and that commit-slab zero-initializes the values stored; let's
>   call this value GENERATION_NUMBER_UNINITIALIZED
>
> If we ensure that corrected commit date can never be zero (which is
> extremely unlikely, as one of root commits would have to be malformed or
> written on badly misconfigured computer, with value of 0 for committer
> timestamp), then this "happy accident" can keep working.
>
>   As a special case, commit date with timestamp of zero (01.01.1970 00:00:00Z)
>   has corrected commit date of one, to be able to distinguish
>   uninitialized values.
>
> Or something like that.
>
> Actually, it is not even necessary, as corrected commit date of 0 just
> means that this single value (well, for every root commit with commit
> date of 0) would be unnecessary recomputed in compute_generation_numbers().
>
> Anyway, we would want to document this fact in the commit message.

Alternatively, instead of comparing 'level' (and later in series also
'corrected_commit_date') against GENERATION_NUMBER_INFINITY,
we could load at no extra cost `graph_pos` value and compare it
against COMMIT_NOT_FROM_GRAPH.

But with this solution we could never get rid of graph_pos, if we
think it is unnecessary. If we split commit_graph_data into separate
slabs (as it was in early versions of respective patch series), we
would have to pay additional cost.

But it is an alternative.

Best,
-- 
Jakub Narębski




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux