Am 07.07.22 um 20:10 schrieb Junio C Hamano: > While our eyes are on array.cocci, I have a few observations on it. > > This is not meant specifically to you, Ævar, but comments by those > more familiar with Coccinelle (and I am sure the bar to pass is > fairly low, as I am not all that familiar) are very much > appreciated. > > @@ > expression dst, src, n, E; > @@ > memcpy(dst, src, n * sizeof( > - E[...] > + *(E) > )) > > This seems to force us prefer sizeof(*(E)) over sizeof(E[i]), when > it is used to compute the byte size of memcpy() operation. There is > no reason to prefer one over the other, but I presume it is there > only for convenience for the other rules in this file (I recall > vaguely reading somewhere that these rules do not "execute" from top > to bottom, so I wonder how effective it is?). It halves the number of syntax variants to deal with. > > @@ > type T; > T *ptr; > T[] arr; > expression E, n; > @@ > ( > memcpy(ptr, E, > - n * sizeof(*(ptr)) > + n * sizeof(T) > ) > | > memcpy(arr, E, > - n * sizeof(*(arr)) > + n * sizeof(T) > ) > | > memcpy(E, ptr, > - n * sizeof(*(ptr)) > + n * sizeof(T) > ) > | > memcpy(E, arr, > - n * sizeof(*(arr)) > + n * sizeof(T) > ) > ) > > Likewise, but this one is a lot worse. > > Taken alone, sizeof(*(ptr)) is far more preferrable than sizeof(T), > because the code will be more maintainable. > > Side Note. I know builtin/mv.c had this type mismatch between > the variable and sizeof() from the beginning when 11be42a4 (Make > git-mv a builtin, 2006-07-26) introduced both the variable > declaration "const char **source" and memmove() on it, but a > code that starts out with "char **src" with memmove() that moves > part of src[] and uses sizeof(char *) to compute the byte size > of the move would become broken the same way when a later > developer tightens the declaration to use "const char **src" > without realizing that they have to update the type used in > sizeof(). > > So even though I am guessing that this is to allow the later rules > to worry only about sizeof(T), I am a bit unhappy to see the rule. > If an existing code matched this rule and get rewritten to use > sizeof(T), not sizeof(*(ptr)), but did not match the other rules to > be rewritten to use COPY_ARRAY(), the overall effect would be that > the automation made the code worse. True. > > @@ > type T; > T *dst_ptr; > T *src_ptr; > T[] dst_arr; > T[] src_arr; > expression n; > @@ > ( > - memcpy(dst_ptr, src_ptr, (n) * sizeof(T)) > + COPY_ARRAY(dst_ptr, src_ptr, n) > | > - memcpy(dst_ptr, src_arr, (n) * sizeof(T)) > + COPY_ARRAY(dst_ptr, src_arr, n) > | > - memcpy(dst_arr, src_ptr, (n) * sizeof(T)) > + COPY_ARRAY(dst_arr, src_ptr, n) > | > - memcpy(dst_arr, src_arr, (n) * sizeof(T)) > + COPY_ARRAY(dst_arr, src_arr, n) > ) > > I take it that thanks to the earlier "meh -- between sizeof(*p) and > sizeof(p[0]) there is no reason to prefer one over the other" and > "oh, no, we should prefer sizeof(*p) not sizeof(typeof(*p)) but this > one is the other way around" rules, this one only has to deal with > sizeof(T). > > Am I reading it correctly? Yes. Without the ugly normalization step in the middle could either use twelve cases instead of four here or use inline alternatives, e.g.: type T; T *dst_ptr; T *src_ptr; T[] dst_arr; T[] src_arr; expression n; @@ ( - memcpy(dst_ptr, src_ptr, (n) * \( sizeof(*(dst_ptr)) \| sizeof(*(src_ptr)) \| sizeof(T) \) ) + COPY_ARRAY(dst_ptr, src_ptr, n) | - memcpy(dst_ptr, src_arr, (n) * \( sizeof(*(dst_ptr)) \| sizeof(*(src_arr)) \| sizeof(T) \) ) + COPY_ARRAY(dst_ptr, src_arr, n) | - memcpy(dst_arr, src_ptr, (n) * \( sizeof(*(dst_arr)) \| sizeof(*(src_ptr)) \| sizeof(T) \) ) + COPY_ARRAY(dst_arr, src_ptr, n) | - memcpy(dst_arr, src_arr, (n) * \( sizeof(*(dst_arr)) \| sizeof(*(src_arr)) \| sizeof(T) \) ) + COPY_ARRAY(dst_arr, src_arr, n) ) I seem to remember that rules like this missed some cases, but perhaps that's no longer an issue with the latest Coccinelle version? > > @@ > type T; > T *dst; > T *src; > expression n; > @@ > ( > - memmove(dst, src, (n) * sizeof(*dst)); > + MOVE_ARRAY(dst, src, n); > | > - memmove(dst, src, (n) * sizeof(*src)); > + MOVE_ARRAY(dst, src, n); > | > - memmove(dst, src, (n) * sizeof(T)); > + MOVE_ARRAY(dst, src, n); > ) > > What I find interesting is that this one seems to be able to do the > necessary rewrite without having to do the "turn everything into > sizeof(T) first" trick. If this approach works well, I'd rather see > the COPY_ARRAY() done without the first two preliminary rewrite > rules. It doesn't support arrays (T[]). That doesn't matter in practice because we don't have such cases (yet?). > > I wonder if the pattern in the first rule catches sizeof(dst[0]) > instead of sizeof(*dst), though. It doesn't. > > @@ > type T; > T *ptr; > expression n; > @@ > - ptr = xmalloc((n) * sizeof(*ptr)); > + ALLOC_ARRAY(ptr, n); > > @@ > type T; > T *ptr; > expression n; > @@ > - ptr = xmalloc((n) * sizeof(T)); > + ALLOC_ARRAY(ptr, n); > > Is it a no-op rewrite if we replace the above two rules with > something like: > > . @@ > . type T; > . T *ptr; > . expression n; > . @@ > . ( > . - ptr = xmalloc((n) * sizeof(*ptr)); > . + ALLOC_ARRAY(ptr, n); > . | > . - ptr = xmalloc((n) * sizeof(T)); > . + ALLOC_ARRAY(ptr, n); > . ) I think so. > > or even following the pattern of the next one ... > > . @@ > . type T; > . T *ptr; > . expression n; > . @@ > . - ptr = xmalloc((n) * \( sizeof(*ptr) \| sizeof(T) \)) > . + ALLOC_ARRAY(ptr, n); > > ... I have to wonder? I like the simplicity of this pattern. In theory this is equivalent. > > @@ > type T; > T *ptr; > expression n != 1; > @@ > - ptr = xcalloc(n, \( sizeof(*ptr) \| sizeof(T) \) ) > + CALLOC_ARRAY(ptr, n) > > And this leaves xcalloc(1, ...) alone because it is a way to get a > cleared block of memory that may not be an array at all. Shouldn't > we have "n != 1" for xmalloc rule as well, I wonder, if only for > consistency? I agree that a single-element array is a bit awkward, so allowing the explicit sizeof in that case is less iffy. ALLOC/CALLOC macros for single items might make that automation more palatable.. René