On Wed, Oct 19, 2022 at 10:45 AM Segher Boessenkool <segher@xxxxxxxxxxxxxxxxxxx> wrote: > > When I did this more than a decade ago there indeed was a LOT of noise, > mostly caused by dubious code. It really happens with explicitly *not* dubious code. Using 'unsigned char[]' is very common in code that actually does anything where you care about the actual byte values. Things like utf-8 handling, things like compression, lots and lots of cases. But a number of those cases are still dealing with *strings*. UTF-8 is still a perfectly valid C string format, and using 'strlen()' on a buffer that contains UTF-8 is neither unusual nor wrong. It is still the proper way to get the byte length of the thing. It's how UTF-8 is literally designed. And -Wpointer-sign will complain about that, unless you start doing explicit casting, which is just a worse fix than the disease. Explicit casts are bad (unless, of course, you are explicitly trying to violate the type system, when they are both required, and a great way to say "look, I'm doing something dangerous"). So people who say "just cast it", don't understand that casts *should* be seen as "this code is doing something special, tread carefully". If you just randomly add casts to shut up a warning, the casts become normalized and don't raise the kind of warning signs that they *should* raise. And it's really annoying, because the code ends up using 'unsigned char' exactly _because_ it's trying to be careful and explicit about signs, and then the warning makes that carefully written code worse. > Then suggest something better? Or suggest improvements to the existing > warning? As I mentioned in the next email, I tried to come up with something better in sparse, which wasn't based on the pointer type comparison, but on the actual 'char' itself. My (admittedly only ever half-implemented) thing actually worked fine for the simple cases (where simplification would end up just undoing all the "expand char to int" because the end use was just assigned to another char, or it was masked for other reasons). But while sparse does a lot of basic optimizations, it still left enough "look, you're doing sign-extensions on a 'char'" on the table that it warned about perfectly valid stuff. And maybe that's fundamentally hard. The "-Wpointer-sign" thing could probably be fairly easily improved, by just recognizing that things like 'strlen()' and friends do not care about the sign of 'char', and neither does a 'strcmp()' that only checks for equality (but if you check the *sign* of strcmp, it does matter). It's been some time since I last tried it, but at least from memory, it really was mostly the standard C string functions that caused almost all problems. Your *own* functions you can just make sure the signedness is right, but it's really really annoying when you try to be careful about the byte signs, and the compiler starts complaining just because you want to use the bog-standard 'strlen()' function. And no, something like 'ustrlen()' with a hidden cast is just noise for a warning that really shouldn't exist. So some way to say 'this function really doesn't care about the sign of this pointer' (and having the compiler know that for the string functions it already knows about anyway) would probably make almost all problems with -Wsign-warning go away. Put another way: 'char *' is so fundamental and inherent in C, that you can't just warn when people use it in contexts where sign really doesn't matter. Linus