Re: [PATCH 1/2] [GSOC] ref-filter: add %(raw) atom

Phillip Wood <phillip.wood123@xxxxxxxxx> · Sat, 29 May 2021 14:23:05 +0100




On 27/05/2021 17:36, Felipe Contreras wrote:
ZheNing Hu via GitGitGadget wrote:
[...]
+static int memcasecmp(const void *vs1, const void *vs2, size_t n)

Why void *? We can delcare as char *.

If you look at how this function is used you'll see
	int (*cmp_fn)(const void *, const void *, size_t);
	cmp_fn = s->sort_flags & REF_SORTING_ICASE
			? memcasecmp : memcmp;

So the signature must match memcmp to avoid undefined behavior (a 
ternary expression is undefined unless both sides evaluate to the same 
type and calling a function through a pointer a different type is 
undefined as well)

+{
+	size_t i;
+	const char *s1 = (const char *)vs1;
+	const char *s2 = (const char *)vs2;

Then we avoid this extra step.

+	for (i = 0; i < n; i++) {
+		unsigned char u1 = s1[i];
+		unsigned char u2 = s2[i];

There's no need for two entirely new variables...

+		int U1 = toupper (u1);
+		int U2 = toupper (u2);

You can do toupper(s1[i]) directly (BTW, there's an extra space: `foo(x)`,
not `foo (x)`).

While we are at it, why keep an extra index from s1, when s1 is never
used again?

We can simply advance both s1 and s2:

   s1++, s2++

+		int diff = (UCHAR_MAX <= INT_MAX ? U1 - U2
+			: U1 < U2 ? -1 : U2 < U1);

I don't understand what this is supposed to achieve. Both U1 and U2 are
integers, pretty low integers actually.

If we get rid if that complexity we don't even need U1 or U2, just do:

   diff = toupper(u1) - toupper(u2);

+		if (diff)
+			return diff;
+	}
+	return 0;
+}

All we have to do is define the end point, and then we don't need i:

	static int memcasecmp(const char *s1, const char *s2, size_t n)
	{
		const char *end = s1 + n;
		for (; s1 < end; s1++, s2++) {
			int diff = tolower(*s1) - tolower(*s2);
			if (diff)
				return diff;
		}
		return 0;
	}

(and I personally prefer lower to upper)

We should be using tolower() as that is what POSIX specifies for 
strcasecmp() [1] which we are trying to emulate and there are cases[2] where
	(tolower(c1) == tolower(c2)) != (toupper(c1) == toupper(c2))

Best Wishes

Phillip

[1] https://pubs.opengroup.org/onlinepubs/9699919799/
[2] https://en.wikipedia.org/wiki/Dotted_and_dotless_I#In_computing

Check the following resource for a detailed explanation of why my
modified version is considered good taste:

https://github.com/felipec/linked-list-good-taste

  static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, struct ref_array_item *b)
  {
  	struct atom_value *va, *vb;
@@ -2304,6 +2382,7 @@ static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, stru
  	int cmp_detached_head = 0;
  	cmp_type cmp_type = used_atom[s->atom].type;
  	struct strbuf err = STRBUF_INIT;
+	size_t slen = 0;
  
  	if (get_ref_atom_value(a, s->atom, &va, &err))
  		die("%s", err.buf);
@@ -2317,10 +2396,32 @@ static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, stru
  	} else if (s->sort_flags & REF_SORTING_VERSION) {
  		cmp = versioncmp(va->s, vb->s);
  	} else if (cmp_type == FIELD_STR) {
-		int (*cmp_fn)(const char *, const char *);
-		cmp_fn = s->sort_flags & REF_SORTING_ICASE
-			? strcasecmp : strcmp;
-		cmp = cmp_fn(va->s, vb->s);
+		if (va->s_size == ATOM_VALUE_S_SIZE_INIT &&
+		    vb->s_size == ATOM_VALUE_S_SIZE_INIT) {
+			int (*cmp_fn)(const char *, const char *);
+			cmp_fn = s->sort_flags & REF_SORTING_ICASE
+				? strcasecmp : strcmp;
+			cmp = cmp_fn(va->s, vb->s);
+		} else {
+			int (*cmp_fn)(const void *, const void *, size_t);
+			cmp_fn = s->sort_flags & REF_SORTING_ICASE
+				? memcasecmp : memcmp;
+
+			if (va->s_size != ATOM_VALUE_S_SIZE_INIT &&
+			    vb->s_size != ATOM_VALUE_S_SIZE_INIT) {
+				cmp = cmp_fn(va->s, vb->s, va->s_size > vb->s_size ?
+				       vb->s_size : va->s_size);
+			} else if (va->s_size == ATOM_VALUE_S_SIZE_INIT) {
+				slen = strlen(va->s);
+				cmp = cmp_fn(va->s, vb->s, slen > vb->s_size ?
+					     vb->s_size : slen);
+			} else {
+				slen = strlen(vb->s);
+				cmp = cmp_fn(va->s, vb->s, slen > va->s_size ?
+					     slen : va->s_size);
+			}
+			cmp = cmp ? cmp : va->s_size - vb->s_size;
+		}

This hurts my eyes. I think the complexity of this chunk warrants a
separate function. Then the logic would be easer to see.

Cheers.