Re: [PATCH v3 00/33] nd/sha1-name-c-wo-the-repository updates

SZEDER Gábor <szeder.dev@xxxxxxxxx> · Sat, 13 Apr 2019 14:14:32 +0200

On Fri, Apr 12, 2019 at 04:25:08PM +0200, Johannes Schindelin wrote:
> Hi,
> 
> On Fri, 12 Apr 2019, SZEDER Gábor wrote:
> 
> > On Thu, Apr 11, 2019 at 10:58:57PM +0200, SZEDER Gábor wrote:
> > > On Thu, Apr 11, 2019 at 10:51:46PM +0200, SZEDER Gábor wrote:
> > > > On Wed, Apr 10, 2019 at 10:56:52PM +0200, Johannes Schindelin wrote:
> > > > > >         ...>}
> > > > > >
> > > > > >       @@
> > > > > >     + expression c;
> > > > > >     ++expression r;
> > > > > >     + expression s;
> > > > > >     + @@
> > > > > >     +-- get_commit_tree(c) = s
> > > > > >     ++- repo_get_commit_tree(r, c) = s
> > > > > >     + + c->maybe_tree = s
> > > > >
> > > > > I think this is wrong, and admittedly I had the very same version
> > > > > originally.
> > > > >
> > > > > When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as
> > > > > opposed to `the_repository`), the conversion to `c->maybe_tree` is most
> > > > > likely incorrect.
> > > > >
> > > > > Therefore, I don't think that we can do that.
> > > >
> > > > So, as far as I understand, the goal of these 'c->maybe_tree'-related
> > > > semantic patches is to prevent "generic" parts of Git from accessing
> > > > this field directly, as it might not be initialized in a
> > > > commit-graph-enabled repository.
> > > >
> > > > Only three functions are explicitly exempt, while this last semantic
> > > > patch in question implicitly allows a few more that assign a value to
> > > > 'c->maybe_tree'.  These functions are release_commit_memory() and
> > > > parse_commit_buffer() in 'commit.c' and fill_commit_in_graph() in
> > > > 'commit-graph.c',
> >
> > ... and make_virtual_commit() in 'merge-recursive.c'.
> >
> > > and after a quick look these functions seem to be
> > > > rather fundamenal in the life-cycle of a commit object.
> > >
> > > Erm, not "commit object"; I meant the life-cycle of a 'struct commit'
> > > instance.
> > >
> > > > I think they deserve to be explicitly exempted, too, and then we could
> > > > remove this last semantic patch altogether.
> >
> > And it would look like this.  Yeah, that's a very long line there, but
> > I don't think we can break it up.
> >
> >   -- >8 --
> >
> > diff --git a/contrib/coccinelle/commit.cocci b/contrib/coccinelle/commit.cocci
> > index 57c8f71479..fe814f313e 100644
> > --- a/contrib/coccinelle/commit.cocci
> > +++ b/contrib/coccinelle/commit.cocci
> > @@ -10,20 +10,15 @@ expression c;
> >  - c->maybe_tree->object.oid.hash
> >  + get_commit_tree_oid(c)->hash
> >
> > -// These excluded functions must access c->maybe_tree direcly.
> > +// These excluded functions must access/modify c->maybe_tree direcly.
> > +// Note that if c->maybe_tree is written somewhere outside of these
> > +// functions, then the recommended transformation will be bogus with
> > +// repo_get_commit_tree() on the LHS.
> >  @@
> > -identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit)$";
> > -expression c;
> > +identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit|fill_commit_in_graph|parse_commit_buffer|release_commit_memory|make_virtual_commit)$";
> 
> Hahahaha! That's *really* long.
> 
> And a good indicator that this should be hidden in a single helper
> function (`set_commit_tree()`, file-local of course) that is exempted in
> the cocci patch.

Note that this is not only about line length, and consider the slight
differences between the three approaches:

  - Currently only direct read accesses to 'c->maybe_tree' outside of
    the listed functions are forbidden and transformed, but still any
    function can set this field directly (thanks to the last semantic
    patch in 'commit.cocci').

  - Encapsulating writes in set_commit_tree() and adding this function
    to that list would prevent other functions from setting
    'c->maybe_tree' directly, but still any function could set it
    indirectly by calling set_commit_tree().

  - With that loooong line only those few listed special functions
    would be able read or write 'c->maybe_tree'.

Does the additional restriction of the long line variant bring us
benefits?  Well, not sure.

The root tree of a commit is needed in many places, and in the past we
got used to it being always initialized in any 'struct commit'
instance.  However, with the commit graph that's not the case anymore,
and any read accesses to the uninitialized root tree object would have
bad consequences.  That's why there is get_commit_tree() helper
performing lazy-initialization, and the protection from direct reads
in the form of the semantic patch.

OTOH, apart from five functions, most parts of Git simply don't want
to create new 'struct commit' instances themselves, or, more
generally, set anything in a 'struct commit', so arguably there is not
that much danger to protect ourselves from.

Anyway, I just wanted to make sure that we fully understand the
implications, and I think any solution is an improvement that
eliminates the current "let's transform this code pattern, but then
quickly transform it back in some cases" back-and-forth.

> > +struct commit *c;
> >  @@
> >    f(...) {<...
> >  - c->maybe_tree
> >  + repo_get_commit_tree(the_repository, c)
> >    ...>}
> > -
> > -@@
> > -expression c;
> > -expression r;
> > -expression s;
> > -@@
> > -- repo_get_commit_tree(r, c) = s
> > -+ c->maybe_tree = s
> >