Re: Post-increment constraint in inline assembly (SuperH)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/29/2018 09:39 PM, Georg-Johann Lay wrote:
My bad.  If GCC uses post-increment, then the value in the post-incremented register no more represents src. But when src+1 is used in the remainder, gcc detects that this value has already been computed and reuses the post-incremented reg instead of recomputing src+1.

Hence src does /not/ change, whereas the register used do address *src /does/ if post-increment is used.  As src does not change, there's no need to express it in terms of constraints.

Fair enough. But in my memcpy() implementation's critical loop, I'd really like src to be in a register (and it is!). Which means GCC will either perform a copy before using it (which is overhead), or realize that src + 1 has already been computed and reuse it. But doing this is as difficult as performing the optimization that makes Segher's solution work in the first place.

Some libc implementation already perform such pre- and post-alignment, e.g. Newlib provided it is compiled for speed (and the machine part doesn't deviates from that default).

As could be seen from my original message, I'm working in a free-standing environment that resembles a kernel, and it doesn't yet have a libc. I'm just at the point where GCC requires the user to implement the core memory functions.

Downside of assembler is that it cannot be inlined, so if the call overhead matters, you may want an assembler version with defined code sequence and with call overhead for large sizes, and a C implementation that can be inlined for small sizes.

I thought of using link-time optimization for this purpose, but it was silly - the whole point of LTO is to dump the /internal representation/ to the object file. There isn't any GIMPLE for assembler functions.

... which is pretty sad actually.

Notice that gcc already performs inline expansion of memcpy provided it is not inhibited by -fno-builtin-memcpy, -ffreestanding etc.  In the latter case you can use __builtin_memcpy for small sizes.  The point where gcc switches from inline expansion and unrolling to libcall (if any) depends on optimization options, (known) alignment, size to copy and also how much work has been but into the respective backend.

I am indeed using -ffreestanding, thanks for pointing that out. I definitely want the builtin memcpy() to be used for small sizes, and I will redefine the appropriate macro.

Regards,
Sébastien



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux