On Fri, Apr 12, 2019 at 04:25:08PM +0200, Johannes Schindelin wrote: > Hi, > > On Fri, 12 Apr 2019, SZEDER Gábor wrote: > > > On Thu, Apr 11, 2019 at 10:58:57PM +0200, SZEDER Gábor wrote: > > > On Thu, Apr 11, 2019 at 10:51:46PM +0200, SZEDER Gábor wrote: > > > > On Wed, Apr 10, 2019 at 10:56:52PM +0200, Johannes Schindelin wrote: > > > > > > ...>} > > > > > > > > > > > > @@ > > > > > > + expression c; > > > > > > ++expression r; > > > > > > + expression s; > > > > > > + @@ > > > > > > +-- get_commit_tree(c) = s > > > > > > ++- repo_get_commit_tree(r, c) = s > > > > > > + + c->maybe_tree = s > > > > > > > > > > I think this is wrong, and admittedly I had the very same version > > > > > originally. > > > > > > > > > > When you have an arbitrary `r` in any `repo_get_commit_tree(r, c)` (as > > > > > opposed to `the_repository`), the conversion to `c->maybe_tree` is most > > > > > likely incorrect. > > > > > > > > > > Therefore, I don't think that we can do that. > > > > > > > > So, as far as I understand, the goal of these 'c->maybe_tree'-related > > > > semantic patches is to prevent "generic" parts of Git from accessing > > > > this field directly, as it might not be initialized in a > > > > commit-graph-enabled repository. > > > > > > > > Only three functions are explicitly exempt, while this last semantic > > > > patch in question implicitly allows a few more that assign a value to > > > > 'c->maybe_tree'. These functions are release_commit_memory() and > > > > parse_commit_buffer() in 'commit.c' and fill_commit_in_graph() in > > > > 'commit-graph.c', > > > > ... and make_virtual_commit() in 'merge-recursive.c'. > > > > > and after a quick look these functions seem to be > > > > rather fundamenal in the life-cycle of a commit object. > > > > > > Erm, not "commit object"; I meant the life-cycle of a 'struct commit' > > > instance. > > > > > > > I think they deserve to be explicitly exempted, too, and then we could > > > > remove this last semantic patch altogether. > > > > And it would look like this. Yeah, that's a very long line there, but > > I don't think we can break it up. > > > > -- >8 -- > > > > diff --git a/contrib/coccinelle/commit.cocci b/contrib/coccinelle/commit.cocci > > index 57c8f71479..fe814f313e 100644 > > --- a/contrib/coccinelle/commit.cocci > > +++ b/contrib/coccinelle/commit.cocci > > @@ -10,20 +10,15 @@ expression c; > > - c->maybe_tree->object.oid.hash > > + get_commit_tree_oid(c)->hash > > > > -// These excluded functions must access c->maybe_tree direcly. > > +// These excluded functions must access/modify c->maybe_tree direcly. > > +// Note that if c->maybe_tree is written somewhere outside of these > > +// functions, then the recommended transformation will be bogus with > > +// repo_get_commit_tree() on the LHS. > > @@ > > -identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit)$"; > > -expression c; > > +identifier f !~ "^(repo_get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit|fill_commit_in_graph|parse_commit_buffer|release_commit_memory|make_virtual_commit)$"; > > Hahahaha! That's *really* long. > > And a good indicator that this should be hidden in a single helper > function (`set_commit_tree()`, file-local of course) that is exempted in > the cocci patch. Note that this is not only about line length, and consider the slight differences between the three approaches: - Currently only direct read accesses to 'c->maybe_tree' outside of the listed functions are forbidden and transformed, but still any function can set this field directly (thanks to the last semantic patch in 'commit.cocci'). - Encapsulating writes in set_commit_tree() and adding this function to that list would prevent other functions from setting 'c->maybe_tree' directly, but still any function could set it indirectly by calling set_commit_tree(). - With that loooong line only those few listed special functions would be able read or write 'c->maybe_tree'. Does the additional restriction of the long line variant bring us benefits? Well, not sure. The root tree of a commit is needed in many places, and in the past we got used to it being always initialized in any 'struct commit' instance. However, with the commit graph that's not the case anymore, and any read accesses to the uninitialized root tree object would have bad consequences. That's why there is get_commit_tree() helper performing lazy-initialization, and the protection from direct reads in the form of the semantic patch. OTOH, apart from five functions, most parts of Git simply don't want to create new 'struct commit' instances themselves, or, more generally, set anything in a 'struct commit', so arguably there is not that much danger to protect ourselves from. Anyway, I just wanted to make sure that we fully understand the implications, and I think any solution is an improvement that eliminates the current "let's transform this code pattern, but then quickly transform it back in some cases" back-and-forth. > > +struct commit *c; > > @@ > > f(...) {<... > > - c->maybe_tree > > + repo_get_commit_tree(the_repository, c) > > ...>} > > - > > -@@ > > -expression c; > > -expression r; > > -expression s; > > -@@ > > -- repo_get_commit_tree(r, c) = s > > -+ c->maybe_tree = s > >