Re: [GSoC] Use unsigned integral type for collection of bits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 18, 2024 at 6:37 AM eugenio gigante
<giganteeugenio2@xxxxxxxxx> wrote:
> I was looking around the codebase for some field of a structure that
> stores collections of bits with signed int type.
>
> > ./diffcore.h:63:     signed int is_binary : 2;
>
> The struct in question is "diff_filespec" and Junio commented the
> declaration of the field as following:
>
> /* data should be considered "binary"; -1 means "don't know yet" */
>
> I read somewhere that one should always prefer unsigned integral type over
> signed integral type for a couple of reasons [1].
> These involve operations like Modulus, Shifting and Overflow.

In the context of Git, we want to be using `unsigned` for variables
which are "bags of bits", where each bit indicates some "on" or "off"
property. Very frequently, such variables have "flags" in their names.
So, a possible scenario might be something like this:

    #define OP_FOO 0x01
    #define OP_BAR 0x02
    #define OP_ZAZ 0x04
    ...
    unsigned int flags = OP_FOO | OP_ZAZ;
    ...
    if ((flags & OP_ZAZ))
        do_some_zaz();

> I didn't dig too much into how the field is used and if there are cases in which
> the mentioned operations are involved (I would like the community
> opinion about this topic before).
>
> Moreover, I don’t know if such a change breaks too much code and if
> it’s worth it.
>
> but maybe I'm missing something. For sure, various If conditions used
> by the function
>
> 'diff_filespec_is_binary' inside 'diff.c' would have to be changed.

The code in question is not being used as a "bag of bits". Rather,
it's a tristate binary with values "not-set", "true", and "false".
Whereas a typical binary could be represented by a single bit, this
one needs the extra bit to handle the "not-set" case. Moreover, it is
idiomatic in the Git codebase for -1 to represent "not-set", so I
think this code is fine as-is since its meaning is clear to those
familiar with the codebase, thus does not need any changes made to it.

> Besides, it's possible that my grep command is not enough and maybe
> more "signed int" can be spotted.

There are cases in the codebase in which a signed type is being used
as a "bag of bits" instead of the more desirable unsigned type. If you
are interested in making such a fix, you might find some candidates
using a search such as this:

    git grep -P '(?<!unsigned )int\s+flags'

For example, it finds this instance in `builtin/add.c`:

    static int refresh(int verbose, const struct pathspec *pathspec)
    {
        int flags = REFRESH_IGNORE_SKIP_WORKTREE |
            (verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET);
        ...
        refresh_index(&the_index, flags, pathspec, seen, ...);
    }

Taking a look at `read-cache-ll.h`, we see:

    int refresh_index(struct index_state *, unsigned int flags, ...);

So, refresh_index() is correctly expecting an unsigned value for
`flags` but refresh() in `builtin/add.c` has undesirably declared
`flags` as signed.

> P.S. I was insecure about how to send this email since it does not
> include a commit.
> I decide not to use git-send-email. Hoping I didn't mess up the format.

Using your preferred email client for general discussion is fine. Most
people only use git-send-email for sending patches.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux