Re: [PATCH v3 1/4] alloc: introduce parsed_commits_count

Abhishek Kumar <abhishekkumar8222@xxxxxxxxx> · Sun, 14 Jun 2020 11:39:38 +0530

On 13.06.2020, Jakub Narębski wrote:

> On 12.06.2020, Abhishek Kumar wrote:
> 
> > Commit slab relies on uniqueness of commit->index to access data. As
> > submodules are repositories on their own, alloc_commit_index() (which
> > depends on repository->parsed_objects->commit_count) no longer
> > returns unique values.
> > 
> > This would break tests once we move `generation` and `graph_pos` into
> > a
> > commit slab, as commits of supermodule and submodule can have the same
> > index but must have different graph positions.
> 
> First, commits of supermodule and of submodule are in different graphs,
> so I don't see why they have to be in the same serialized commit-graph
> file.
> 

I should have been clearer in the commit message:

commit->index = ith commit allocated for the repository.

As both supermodule and submodule are repositories, they can have
commits with the same index.

This means that were both the ith commit allocated for supermodule and
submodule respectively.

However, the commit-slab code treats them identically (because of the
same index) and they (incorrectly) have same generation number and graph
position.

Switching to a global counter instead of per-repo counter avoids having
two commits with the same index in the first place.

> Second, Git stores many different types of information on slab already.
> How comes that we have not had any problems till now?  
> 
> There is contains_cache, commit_seen, indegree_slab, author_date_slab,
> commit_base, commit_pos, bloom_filter_slab, buffer_slab,
> commit_rev_name,
> commit_names, commit_name_slab, saved_parents, blame_suspects,
> commit_todo_item.
> 

That's a fair point - It seems rather unlikely that we had not faced
this problem yet.

> > 
> > Let's introduce a counter variable, `parsed_commits_count` to keep
> > track
> > of parsed commits so far.
> 
> All right, thought it might be worth mentioning that it is a global
> variable, or rather a static variable in a function.
> 

Alright, noted.

> > 
> > 
> > Signed-off-by: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx>
> > ---
> > 
> > CI Build for the failing tests:
> > https://travis-ci.com/github/abhishekkumar2718/git/jobs/345413840
> 
> The failed tests are, from what I see:
> - t4060-diff-submodule-option-diff-format.sh
> - t4041-diff-submodule-option.sh
> - t4059-diff-submodule-not-initialized.sh
> 
> 
> > 
> >   alloc.c | 4 +++-
> >   1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/alloc.c b/alloc.c
> > index 1c64c4dd16..29f0e3aa80 100644
> > --- a/alloc.c
> > +++ b/alloc.c
> > @@ -101,7 +101,9 @@ void *alloc_object_node(struct repository *r)
> >   
> >   static unsigned int alloc_commit_index(struct repository *r)
> >   {
> > -     return r->parsed_objects->commit_count++;
> > +     static unsigned int parsed_commits_count = 0;
> > +     r->parsed_objects->commit_count++;
> 
> Do we use r->parsed_objects->commit_count anywhere?

We don't use it anywhere else. 14ba97f8 (alloc: allow arbitary
repositories for alloc functions, 2019-05-15) made the count per
repository but it's only been used to assign commit index.

Junio suggested that we could remove commit_count from the struct and
clean up function calls a bit.

> 
> > +     return parsed_commits_count++;
> 
> Does it matter that it is not thread safe, because it is not atomic?
> Shouldn't it be
> 
>   +     static _Atomic unsigned int parsed_commits_count = 0;
> 
>   or
> 
>     +   static _Atomic unsigned int parsed_commits_count =
>     ATOMIC_VAR_INIT(0);
> 
>     (If it is allowed).
> 
>     >   }
>     >   
>     >   void init_commit_node(struct repository *r, struct commit *c)
>     > 

Thanks
Abhishek