Re: Optimising away memset() calls?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/10/14 09:46, David Brown wrote:
When a function is specified in the C standards, then the compiler
/does/ know all about it.  It knows that the memset_s library function
does not "store s in a global variable", because the C standard does not
allow it to do that (or at least, it does not allow such an action to be
visible to the program).
I was refering both compilers not knowing about memset_s definition and
those that do. For the later, the compiler knows that memset_s won't store
s in a global variable, but the semantic of "shall assume that the memory
indicated by s and n may be accessible in the future and thus must contain
the values indicated by c" would be equivalent in this aspect to "the pointer
is stored in a global variable".
And the compiler is free to implement memset_s
in any way it wants, including inlining it
That shouldn't be a problem.

or perhaps even removing it
as long as the behaviour is correct as seen by the C abstract machine.
My point was that both things (removing + correct behaviour) could not be done.
(I was midly expecting someone to readily present a counterexample, though)

This is complicated by the fact that the standards don't actually
specify what is meant by things like "memory accesses".

Adding to that, as has been noted by others, particular architectures
might need things like memory barriers, cache flushes, synchronisation
instructions, etc., in order for the writes to be visible across the
system.  The C compiler knows nothing about these things (it can provide
helpful intrinsic functions, but can't use them automatically), because
the C standards don't cover them.
This is an interesting point. I agree that the compiler could reorder the memset_s call, but I don't think that more than delaying it a few statements for which it can prove they don't access that area. the next library call (even if it is in the C spec, remember
that the way they are implemented is not defined).

Thus, the zeroed contents might not be immediatly available for an omniscient inspector, but they would in a small delta. Or, if we have a bizarre architecture needing a barrier in order to "commit" the memory write, that shall be performed by memset_s for fullfilling
the "must contain the values indicated by c" requeriment.

It is true that you may need special instructions for ensuring the new value from a concurrent thread (I don't consider memset_s suitable for clearing a spinlock), but C doesn't deal with threads or shared memory, thus you are . Then you either use another primitive to synchronize them (and at that point the memory will have to be memsetted), or they are in a race condition
and there's nothing specified for what it may contain.

I also speculated with the idea of a processor that optimized the microcode in such a way that ended up removing the memset_s call, but concluded that (in addition of the cpu requiring such global knowledge not to be realistic) then memset_s would have to include its special barriers for fulfilling the "shall assume that the memory indicated by s and n may be accessible in the future and thus must contain the values indicated by c" if it was otherwise implemented with such instructions that
would allow the processor not to store the values in memory.


So the only way to be absolutely sure that a memory area really is
cleared is to use an external function that the compiler does not know
about, and which also incorporates any required additional
machine-specific code.  Thus you need to use memzero_explicit(),
bzero_explicit(), or equivalent.
And is architecture specific and not portable. IMHO memset_s serves the task equally well, with the benefit
of being a standard function.




PS: As I was finishing this email, I thought the following dull implementation (error checking skipped):

errno_t memset_s(void *s, rsize_t smax, int c, rsize_t n) {
   size_t i; unsigned char *tmp;
   tmp = malloc(n);
   for (i = 0; i < n; i++) {
     tmp[i] = ((unsigned char *)s)[i] ^(unsigned char) c;
   }
   for (i = 0; i < n; i++) {
     ((unsigned char *)s)[i] ^= tmp[i];
   }
}

Although completely missing the point, I think it would be conformant.* If you know that a specific implementation is flawed, please avoid it (you can use a replacement) or, better yet, replace your libc with a good one.







[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux