On Thu, Nov 8, 2018 at 9:18 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Stefan Beller <sbeller@xxxxxxxxxx> writes: > > > From: SZEDER Gábor <szeder.dev@xxxxxxxxx> > > > > Add a description and place on how to use coccinelle for large refactorings > > that happen only once. > > > > Based-on-work-by: SZEDER Gábor <szeder.dev@xxxxxxxxx> > > Signed-off-by: Stefan Beller <sbeller@xxxxxxxxxx> > > --- > > > > I consider including this patch in a resend instead. > > It outlays the basics of such a new workflow, which we can refine later. > > Thanks for tying loose ends. > > > diff --git a/contrib/coccinelle/README b/contrib/coccinelle/README > > index 9c2f8879c2..fa09d1abcc 100644 > > --- a/contrib/coccinelle/README > > +++ b/contrib/coccinelle/README > > @@ -1,2 +1,62 @@ > > This directory provides examples of Coccinelle (http://coccinelle.lip6.fr/) > > semantic patches that might be useful to developers. > > + > > +There are two types of semantic patches: > > + > > + * Using the semantic transformation to check for bad patterns in the code; > > + This is what the original target 'make coccicheck' is designed to do and > > + it is expected that any resulting patch indicates a regression. > > + The patches resulting from 'make coccicheck' are small and infrequent, > > + so once they are found, they can be sent to the mailing list as per usual. > > + > > + Example for introducing new patterns: > > + 67947c34ae (convert "hashcmp() != 0" to "!hasheq()", 2018-08-28) > > + b84c783882 (fsck: s/++i > 1/i++/, 2018-10-24) > > + > > + Example of fixes using this approach: > > + 248f66ed8e (run-command: use strbuf_addstr() for adding a string to > > + a strbuf, 2018-03-25) > > + f919ffebed (Use MOVE_ARRAY, 2018-01-22) > > + > > + These types of semantic patches are usually part of testing, c.f. > > + 0860a7641b (travis-ci: fail if Coccinelle static analysis found something > > + to transform, 2018-07-23) > > Yup, and I think what we have in 'pu' (including your the_repository > stuff) falls into this category. My impression was that the_repository is strongly second category as the_repository.cocci doesn't fix bad smells of code, but proposes a refactoring that we agreed on doing. > I've been paying attention to "make coccicheck" produced patches for > the past few weeks, and so far, I found it _much_ _much_ more > pleasant, compared to having to worry about merge conflicts with the > topics in flight that changes day to day (not just because we add > new topics or update existing topics, but also the order of the > merge changes as topics mature at different rates and jumps over > each other in master..pu history), that "make coccicheck" after > topics are merged to integration branches fixes these issues up as > needed. So from your end we would not need the "pending" category as long as the follow ups come in a timely manner? > > > + 3) Apply the semantic patch only partially, where needed for the patch series > > + that motivates the large scale refactoring and then build that series > > + on top of it. > > It is not quite clear what "needed for the patch series" really > means in the context of this paragraph. What are the changes that > are not needed, that is still produced if we ran "make coccicheck"? An example for "needed" would be 3f21279f50..bd8737176b whereas "not needed" is what is in "treewide: apply cocci patch". The treewide conversion of e.g. unuse_commit_buffer to repo_unuse_commit_buffer is nice, but "needed" only in its followup patch that converts logmsg_reencode as that calls into the unuse_commit_buffer. > Are they wrong changes (e.g. a carelessly written read_cache() to > read_index(&the_index) conversion may munge the implementation of > read_cache(...) { return read_index(&the_index, ...); } and make > inifinite recursion)? Or are they "correct but not immediately > necessary" (e.g. because calling read_cache() does not hurt until > that function gets removed, so rewriting the callers to call > read_index() with &the_index may be correct but not immediately > necessary)? the latter. I assume correctness (of the semantic patch) to be a given, but this is all about timing, i.e. how can I send the series without breaking others in flight. > > > + By applying the semantic patch only partially where needed, the series > > + is less likely to conflict with other series in flight. > > That is correct. > > > + To make it possible to apply the semantic patch partially, there needs > > + to be mechanism for backwards compatibility to keep those places working > > + where the semantic patch is not applied. This can be done via a > > + wrapper function that has the exact name and signature as the function > > + to be changed. > > OK, so this argues for leaving read_cache()-like things to help > other in-flight topics, while a change to encourage the use of > read_index() takes place. That also makes sense, and this directly > relates to "less likely to conflict" benefit you mentioned above. ok. > > But it is still unclear to me then what are "necessary". > > ... goes and thinks ... > > OK, so a series that allows a codepath to work on an arbitrary > in-core istate, for example, may need to update a function to take > istate and use it to call read_index(istate...), and the old code in > such a call chain must have used read_cache(), always operating on > &the_index. For the purpose of that series, it does not matter if > other codepaths that are not involved in the callchain the series > wants to update are still only working on &the_index, so a change to > turn read_cache() into read_index(&the_index) is *not* necessary > (but still would be correct) and should be left out of the series. > But any change the series makes to the callchain in question that > turns read_cache() into read_index() with something call-specific > (not &the_index) is a necesary one. Is that a reasonable example > of what these paragraphs wanted to convey with the distinction > between "needed for the patch series" and other changes? Exactly. Maybe another way to phrase it is to explain the two series independently of each other: 1) Create the semantic patch series containing 1a) - a *.pending.cocci semantic patch 1b) - forward/backward compatibility providers (wrapper/defines) 1c) then send the semantic patch series to the list 2a) Write the other series as if (1) doesn't exist This means there will be some upgrades of the call chain from read_cache() to read_index() 2b) Coincidentally these upgrades are the same as (1) would have produced. That's the whole trick. 2c) send of this series independently of (1) This can be done for read_cache / read_index as they both exist already, but when read_index is new to the code base, we'd need (2) to rely on (1b). And that is why this patch sounded complicated.