Re: strlen

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 This is not the forum for such a discussion.  But I want to make people reading this aware that many expert C and C++ programmers (likely a majority) consider that advice to avoid unsigned types to be horrible advice.
I advise people to avoid signed types and I do so myself.  If an integer value won't be negative, it shouldn't be signed.  That makes your intent clearer to anyone reading your code, and (especially in x86-64) lets the compiler generate smaller and faster code.
If you aren't an expert, I strongly suggest letting the library functions guide you in choice of data type.  Where it takes a size_t, your variables that exist primarily to be passed that way should be size_t.  Where it returns std::size_t, your variables that exist primarily to take that return should be size_t and especially, typical functions you write to parallel those functions should have the same return type.  This is a rule I usually break myself, because I write a lot of code in which speed is critical and in which various cache issues dominate speed (rather than actual computations dominating speed), because I know when values will reliably be less than 2 to 32, and because there are many usages in which 32 bit unsigned is the most efficient x86-64 data type (more efficient that size_t and much more efficient than int).  But for most programmers, the safety and clarity and portability of using size_t (rather than 32 bit unsigned) in such situations outweighs the performance difference.

Obviously, others disagree.  I hope I'm not starting a big discussion here (I don't mind a big discussion somewhere, but not here).  I just wanted to balance the one sided opinion (that I have seen too many times recently).
 
 
-----Original Message-----
From: Alejandro Colomar (man-pages) via Gcc-help <gcc-help@xxxxxxxxxxx>
To: Jonny Grant <jg@xxxxxxxx>
Cc: gcc-help@xxxxxxxxxxx; linux-man <linux-man@xxxxxxxxxxxxxxx>; Florian Weimer <fw@xxxxxxxxxxxxx>; Michael Kerrisk <mtk.manpages@xxxxxxxxx>
Sent: Thu, Jul 8, 2021 7:06 am
Subject: Re: strlen

On 7/8/21 12:07 PM, Jonny Grant wrote:
> Thank you for your reply.
> 
> We can't guarantee safestrlen() won't be called with NULL. So because strlen() itself doesn't check for NULL in C standard we'd need to call the wrapper so that NULL can be checked for.
> 
> I'd like to avoid the compiler removing certain execution paths.
> I'd rather keep all code paths, even if they are not taken, just in case a NULL pointer creeps in due to an external device that is connected to an embedded system.
> 
> 
> Probably this would work:
> 
> size_t __attribute__((optimize("O0"))) safestrlen(const char * s)
> {
>      if (NULL == s) return 0;
>      else return strlen(s);
> }

I don't think you don't need that.  Unless there's a bug in GCC, it 
shouldn't optimize that path unless it is 100% sure that it will never 
be called.

Moreover, I recommend you to optimize as much as possible.
Even though NULL is possible in your code, I guess it's unlikely.

Also, calling a function safe is too generic.
I'd call it with the suffix null, as it act different on null.

Also, I recommend avoiding 'size_t' (and any other unsigned types, BTW).
See <https://google.github.io/styleguide/cppguide.html#Integer_Types>.
Use the POSIX type 'ssize_t'.
That also allows differentiating a length of 0 (i.e., "") from an 
invalid string (i.e., NULL), by returning -1 for NULL.

Recommended implementation (requires C99 or later due to the usage of 
inline):

[
// compiler.h

#define likely(x)    __builtin_expect((x), 1)
#define unlikely(x)  __builtin_expect((x), 0)
]


[
// strlennull.h

#include <string.h>
#include <sys/types.h>

#include "compiler.h"

[[gnu::pure]]
inline ssize_t strlennull(const char *s);

inline ssize_t strlennull(const char *s)
{
    if (unlikely(!s))
        return -1;
    return strlen(s);
}
]


[
// strlennull.c

#include "strlennull.h"

#include <sys/types.h>

extern inline ssize_t strlennull(const char *s);
]


BTW, if the input is so untrusted that NULL may be possible, I guess it 
might as well contain invalid strings (non-NUL-terminated), for which 
strlen(3) would behave as bad or worse.  So I recommend having a look at 
strnlen(3) and maybe implement strnlennull() instead of strlennull().


Unless you _know_ there's a compiler bug that doesn't allow you to 
optimize, please optimize.  Otherwise, if it's just a bit of paranoia, 
you could as well not optimize any code at all.  Specifically not 
optimizing this code by the use of pragmas or attributes is *wrong*. 
Just revise the preprocessor output, the compiler output, and also try 
introducing a NULL at run time, to check that everything's fine.

> 
> I also use 'volatile' for reads/writes to addresses that I don't want to be optimized out.

Are you sure you don't want to optimize?  Are those addresses hardware 
I/O or interrupts?

Maybe you just want membarrier(2).

> 
>>
>> What I recommend you to do from time to time, to make sure you don't miss any warnings, is compile the whole project first with '-O3' and then with '-O0'.  If you are a bit paranoic, sporadically you can try all of them : '-Og', '-O0', '-O1', '-Os', '-O2', '-O3', '-Ofast' but I don't think that is necessary.  Of course, always use '-fanalyzer' (GCC 10 and above).
> 
> Yes, I am looking forward to David Malcom's -fanalyzer when Ubuntu LTS next upgrades, I'm on gcc 9.3 today. But -fanalyzer is only for C anyway.. much of of code base I work with is compiled as C++ so I can't use -fanalyzer yet.

You can install gcc-10 for Bionic: apt-get install gcc-10
I recommend it.  It finds bugs very deep in the code.


Cheers,

Alex


-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux