ssize_t portability (was: [PATCH 4/6] builtin/stash: provide a way to export stashes to a ref)

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Fri, 18 Mar 2022 14:41:57 +0100

On Mon, Mar 14 2022, Phillip Wood wrote:

> Hi Brian and Ævar
> [...]

Hi, sorry about the late reply, just on this point (which I see is stil
unaddressed), and changing $subject for the generic question:

>>> +	for (ssize_t i = nitems - 1; i >= 0; i--) {
>> The ssize_t type can be really small (it's not a signed size_t), so
>> this
>> is unportable, but in practice maybe it's OK...
>
> I'm not really convinced by this ssize_t can be small argument[2], do
> you know of any platforms where it is true?

Where we'd overflow in this code as written? Yes, every platform we
build on, since e.g. on Linux it's got half the unsigned size of
SIZE_MAX, on my 64 bit box:

    SIZE_MAX  = 18446744073709551615
    SSIZE_MAX = 9223372036854775807

Of course exceeding 2^63-1 or even 2^31-1 number of stashes seems
unlikely in practice.

If you meant are there platforms where ssize_t is as small as you can
pedintically make it, i.e. not just half the signed range of size_t, but
something much smaller?

Possibly, but I think that's unlikely in practice given the
homogenization in computing. Even C just mandated 2's compliment!

Although I think it's still imporant to understand that in the specific
case of ssize_t, unlike other "signed version of type X" the *intent* of
the standard is clearly to *not* mandate that it's a signed version of
size_t. I.e. it's intended for functions like:

     ssize_t read(int fildes, void *buf, size_t nbyte);

Which both per the standard and in practice might have limits that are
much smaller than their respective types. E.g. on my Linux box read(2)
says:

    On Linux, read() (and similar system calls) will transfer at most
    0x7ffff000 (2,147,479,552) bytes, returning the number of bytes
    actually transferred.  (This is true on both 32-bit and 64-bit
    systems.)

I can see how *some* platform might take those liberties in
practice. I.e. 1/2 of your addressable memory/offset != number of bytes
you'd want to return or consider at once for I/O purposes.

But in any case, we do have existing unportable code in-tree, but
generally with C it's good practice to avoid unportable code if it's
easy, you never know what platform will be released tomorrow that you
might have to scramble to support.

As I noted in this and other similar cases it's trivial to just use
size_t here. It's just a matter of changing (leaving the C99-specifics
here aside):

	for (i = nitems - 1; i >= 0; i--) {
		item = ary[i];
                [...];

To, as I did in 99d60545f87 (string-list API: change "nr" and "alloc" to
"size_t", 2022-03-07):

	for (i = nitems; i >= 1; i--) {
		item = ary[i - 1];
                [...];

Or even, if repeating "i - 1" is tedious:

	for (cnt = nitems; cnt >= 1; cnt--) {
		size_t i = cnt - 1;

		item = ary[i];
                [...];

Portability concerns aside I think such code is much clearer and easier
to reason about, since you no longer have to carefully squint to see if
we're doing the right thing with the two types in play.