On Sun, Feb 18, 2024 at 6:37 AM eugenio gigante <giganteeugenio2@xxxxxxxxx> wrote: > I was looking around the codebase for some field of a structure that > stores collections of bits with signed int type. > > > ./diffcore.h:63: signed int is_binary : 2; > > The struct in question is "diff_filespec" and Junio commented the > declaration of the field as following: > > /* data should be considered "binary"; -1 means "don't know yet" */ > > I read somewhere that one should always prefer unsigned integral type over > signed integral type for a couple of reasons [1]. > These involve operations like Modulus, Shifting and Overflow. In the context of Git, we want to be using `unsigned` for variables which are "bags of bits", where each bit indicates some "on" or "off" property. Very frequently, such variables have "flags" in their names. So, a possible scenario might be something like this: #define OP_FOO 0x01 #define OP_BAR 0x02 #define OP_ZAZ 0x04 ... unsigned int flags = OP_FOO | OP_ZAZ; ... if ((flags & OP_ZAZ)) do_some_zaz(); > I didn't dig too much into how the field is used and if there are cases in which > the mentioned operations are involved (I would like the community > opinion about this topic before). > > Moreover, I don’t know if such a change breaks too much code and if > it’s worth it. > > but maybe I'm missing something. For sure, various If conditions used > by the function > > 'diff_filespec_is_binary' inside 'diff.c' would have to be changed. The code in question is not being used as a "bag of bits". Rather, it's a tristate binary with values "not-set", "true", and "false". Whereas a typical binary could be represented by a single bit, this one needs the extra bit to handle the "not-set" case. Moreover, it is idiomatic in the Git codebase for -1 to represent "not-set", so I think this code is fine as-is since its meaning is clear to those familiar with the codebase, thus does not need any changes made to it. > Besides, it's possible that my grep command is not enough and maybe > more "signed int" can be spotted. There are cases in the codebase in which a signed type is being used as a "bag of bits" instead of the more desirable unsigned type. If you are interested in making such a fix, you might find some candidates using a search such as this: git grep -P '(?<!unsigned )int\s+flags' For example, it finds this instance in `builtin/add.c`: static int refresh(int verbose, const struct pathspec *pathspec) { int flags = REFRESH_IGNORE_SKIP_WORKTREE | (verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET); ... refresh_index(&the_index, flags, pathspec, seen, ...); } Taking a look at `read-cache-ll.h`, we see: int refresh_index(struct index_state *, unsigned int flags, ...); So, refresh_index() is correctly expecting an unsigned value for `flags` but refresh() in `builtin/add.c` has undesirably declared `flags` as signed. > P.S. I was insecure about how to send this email since it does not > include a commit. > I decide not to use git-send-email. Hoping I didn't mess up the format. Using your preferred email client for general discussion is fine. Most people only use git-send-email for sending patches.