Am 18.07.20 um 05:52 schrieb Matheus Tavares: > On Thu, Jul 16, 2020 at 9:56 AM Johannes Schindelin <Johannes.Schindelin@xxxxxx> wrote: >> >> Now, I am _far_ from knowing what I'm doing with Coccinelle, but I think >> this here semantic patch should get you going: >> >> -- snipsnap -- >> @@ >> expression E; >> @@ >> { >> ++ char hex[GIT_MAX_HEXSZ + 1]; >> ... >> - oid_to_hex(E) >> + oid_to_hex_r(hex, E) >> ... >> } >> >> @@ >> expression E1, E2; >> @@ >> { >> ++ char hex1[GIT_MAX_HEXSZ + 1], hex2[GIT_MAX_HEXSZ + 1]; >> ... >> - oid_to_hex(E1) >> + oid_to_hex_r(hex1, E1) >> ... >> - oid_to_hex(E2) >> + oid_to_hex_r(hex2, E2) >> ... >> } > > Thanks for this nice example! This already worked very well in some of > my tests :) > > However, with my _very_ limited notion of Coccinelle, I didn't > understand why some code snippets didn't match the above rules. For > example, the structure below: > > func(...) > { > if (cond) > func2("%s", oid_to_hex(a)); > } > > I thought it could be because the `if` statement is missing the curly > brackets (and it does work if I add the brackets), but to my surprise, > adding another oid_to_hex() call in an `else` case also made the code > match the rule: > > func(...) > { > if (cond) > func2("%s", oid_to_hex(a)); > else > func2("%s", oid_to_hex(a)); > } > > The following snippet also correctly matches, but spatch introduces only > one `hex` variable: > > if (cond) > func2("%s, %s", oid_to_hex(a), oid_to_hex(b)); > else > func2("%s", oid_to_hex(a)); > The produced diff looks like this: + char hex[GIT_MAX_HEXSZ + 1]; if (cond) - func2("%s, %s", oid_to_hex(a), oid_to_hex(b)); + func2("%s, %s", oid_to_hex_r(hex, a), oid_to_hex_r(hex, b)); else - func2("%s", oid_to_hex(a)); + func2("%s", oid_to_hex_r(hex, a)); So the first rule was applied (the one for a single oid_to_hex call), but we need the second one (for two oid_to_hex calls). Using the same hex buffer for two different values won't work. > We will probably want our semantic rules to handle an arbitrary number > of `oid_to_hex()` calls in each function, but in scenarios like the > above one, we only really need 2 hex buffers despite having 3 calls... oid_to_hex() has two interesting features, and we need to make sure they are preserved for callers that use them: It has a ring of four buffers, so you can use it for four different values, and those buffers are static, so its results can be passed around arbitrarily. > That might be a little tricky, I guess. It may be impossible to cover all cases. E.g. callers of oid_to_hex() could return that buffer (like e.g. diff_aligned_abbrev()) or save them in some global variable (like e.g. string_list_append() with a non-DUP string_list). But safe conversion rules can got us surprisingly far. > Another thing that might be tricky in this conversion is checking for > name conflicts with the added `hex` variable (but maybe Coccinelle > already has a facilitator mechanism for such cases? IDK). That's easy. There exists no hex_buffer, yet. We can just claim that name for automatic conversions. It's a bit too long for people to type out (we have a few variables named hexbuffer, though), but not crazy long. So what is safe? Calls of oid_to_hex() in the argument list of many functions is. E.g. puts() and printf() just consume the string that is passed to them, and they don't store it anywhere. That means no static storage is needed for those. string_list_append() is only safe if it's the duplicating variant. Since this is a runtime option of the underlying string_list this is hard to prove in a Coccinelle rule. The time is better spent converting them manually, I think. And function calls with more than four oid_to_hex() calls are broken already, so we only need to have rules for up to four of them. Here is the simplest rule I can come up with for handling up to four oid_to_hex() calls: @@ identifier fn =~ "^(.*printf.*|puts)$"; @@ ( fn(..., - oid_to_hex + oid_to_hex_r ( + hex_buffer, ... ), ..., - oid_to_hex + oid_to_hex_r ( + hex_buffer2, ... ), ..., - oid_to_hex + oid_to_hex_r ( + hex_buffer3, ... ), ..., - oid_to_hex + oid_to_hex_r ( + hex_buffer4, ... ), ... ) | fn(..., - oid_to_hex + oid_to_hex_r ( + hex_buffer, ... ), ..., - oid_to_hex + oid_to_hex_r ( + hex_buffer2, ... ), ..., - oid_to_hex + oid_to_hex_r ( + hex_buffer3, ... ), ... ) | fn(..., - oid_to_hex + oid_to_hex_r ( + hex_buffer, ... ), ..., - oid_to_hex + oid_to_hex_r ( + hex_buffer2, ... ), ... ) | fn(..., - oid_to_hex + oid_to_hex_r ( + hex_buffer, ... ), ... ) ) So the sub-rules are ordered from matching all four possible oid_to_hex() calls down to a single one. Only safe consumers are matched. That regex can and should be extended. Having a list of good consumers has the nice side-effect of speeding up the diff generation. It still takes a few minutes on my system, though. We still need to declare of the local buffers. We can add them on demand to each function with rules like this: @avoid_duplicates@ identifier buf =~ "^hex_buffer[2-4]?$"; @@ - char buf[GIT_MAX_HEXSZ + 1]; @hex_buffer4_on_demand exists@ identifier fn; @@ fn(...) { + char hex_buffer4[GIT_MAX_HEXSZ + 1]; <+... hex_buffer4 ...+> } @hex_buffer3_on_demand exists@ identifier fn; @@ fn(...) { + char hex_buffer3[GIT_MAX_HEXSZ + 1]; <+... hex_buffer3 ...+> } @hex_buffer2_on_demand exists@ identifier fn; @@ fn(...) { + char hex_buffer2[GIT_MAX_HEXSZ + 1]; <+... hex_buffer2 ...+> } @hex_buffer_on_demand exists@ identifier fn; @@ fn(...) { + char hex_buffer[GIT_MAX_HEXSZ + 1]; <+... hex_buffer ...+> } Why remove them first? To avoid duplicates when other convertible oid_to_hex() calls are added later and the semantic patch applied again. Why declare the buffers with function scope? To avoid shadowing. Why are the rules reversed? They add declarations at the top, so hex_buffer to hex_buffer4 end up being declared in that order in the resulting file. Why not use the "fresh identifier" feature of Coccinelle to generate an unused name each time? I don't know how to integrate that into the 4/3/2/1 rule above without having to repeat the list of safe consumers or declaring unused buffers. And reusing buffers between safe consumers is fine. René