Re: strlen

johnsfine--- via Gcc-help <gcc-help@xxxxxxxxxxx> · Fri, 9 Jul 2021 19:51:51 +0000 (UTC)

 > Choosing an unsigned type has liabilities as well.  Because unsigned
> arithmetic wraps around zero whereas signed arithmetic does not (it
> overflows which is undefined), its results are less constrained than
> those of signed arithmetic.
In the vast majority of real programming cases in which the defined overflow of unsigned causes a significant performance weakness, signed has a different but typically worse performance weakness.
I wish gcc had some attribute to correct this flaw.  (If that has been added since I last checked, please tell me).
The common example (that an expert will recognize as an example of a more common issue) is:
p[x+1]  where x is a 32 bit unsigned.

p[ size_t(x) + 1 ] very often generates smaller/faster code.  But that is too tedious for the programmer to insert and the programmer should not need to know when that would help.

The programmer often knows that size_t(x)+1 would behave the same as size_t(x+1), but the compiler does not know that.The compiler knows which of size_t(x)+1 vs. size_t(x+1) is better in a specific usage.  The programmer may know that one or the other can be better, but shouldn't need to worry about which.

Usually those who want this feature express it as a data type like uint32_t but with undefined overflow.  So the compiler can then know that size_t(x)+1 would behave the same as size_t(x+1) because the code behavior would be undefined if that were not true.
I would prefer an "unspecified promotion" attribute to an "undefined overflow" attribute, so a type that is stored as 32 bit unsigned, but at the compiler's convenience MIGHT be promoted to a larger type anywhere up the path (within the tree) from where it occurs to where it would necessarily stop being a 32 bit unsigned anyway.
Example p[x*q+r] in the common case that p is of some type such that this is actually an inlined call to operator[](size_t)
assuming q and r are also smaller than size_t, and x is of the type with unspecified promotion, we have size_t(x*q+r) but the compiler should freely be able to choose size_t(x*q)+r or size_t(x)*q+r, whichever of those optimizes best.
In almost all cases that really matter, such an unspecified promotion vs. undefined overflow would allow exactly the same optimizations.  In the obscure cases where one would allow an optimization that the other would not, I think the programmer would be much less surprised by the "unspecified promotion" behavior than "undefined overflow".
But all of this is just a wish for slightly better.  In the world as it exists now, choosing a 32 bit unsigned as an index type produces smaller faster code than either a 64 bit unsigned or either size signed.  The cases where a 64 bit index type (either signed or unsigned) is smaller/faster than 32 bit would be fixed by such an attribute, but even without fixing that, they are dwarfed by the cases where a 32 bit unsigned index type is better than 64 bit.  Through all of that, the beginner choice of 32 bit signed for an indexed type produces the least efficient code.

All of this also only applies in the case where you are really trying to push performance and you are really sure there won't be more than 4 billion items in one level of one container.  Otherwise (and for beginners, regardless) your index type should be size_t because that is the library's index type.
I have done a lot programming in situations where the performance of the index type is critical, and even though the total number of objects exceeds 4 billion, the total number of objects visible to a single indexing operation cannot.  I always do something like:typedef unsigned index_t;and then use index_t for all indexes.That allows a test build changing just that typedef to use std::size_t to verify both that the results are the same and that those same results take significantly longer.  I think there are only a dozen places in a massive amount of code where profiling then told me to do just a bit better by changing p[x+1] to (p+1)[x] where p was an iterator and x was an index.  The optimizer would get to the same place with p[size_t(x)+1], which also works for vectors, but I find (p+1)[x] clearer and I tended to access through iterators even when the containers were vectors.

-----Original Message-----
From: Martin Sebor <msebor@xxxxxxxxx>
To: LIU Hao <lh_mouse@xxxxxxx>; johnsfine@xxxxxxxxxxx; gcc-help@xxxxxxxxxxx <gcc-help@xxxxxxxxxxx>; jg@xxxxxxxx <jg@xxxxxxxx>
Cc: linux-man@xxxxxxxxxxxxxxx <linux-man@xxxxxxxxxxxxxxx>; fw@xxxxxxxxxxxxx <fw@xxxxxxxxxxxxx>; mtk.manpages@xxxxxxxxx <mtk.manpages@xxxxxxxxx>
Sent: Fri, Jul 9, 2021 1:26 pm
Subject: Re: strlen