How can I lock asm blocks in place?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm trying to create some template specializations to combine short movs
into larger movs. A simple example might be:

struct SPoint {
    uint16_t x;
    uint16_t y;
    inline void operator= (const SPoint& v)
    {
        x = v.x;
        y = v.y;
    }
};

I then try to implement something like this:

inline void SPoint::operator= (const SPoint& v)
{
    *reinterpret_cast<uint32_t*>(&x) =
        *reinterpret_cast<const uint32_t*>(&v.x);
}

The point structure is very convenient to use and putting hacks like
this in results in a very useful code size reduction without having to
write everything in assembly. Unfortunately, the above code breaks
aliasing rules and doesn't compile. So I'm using the union cast hack to
get around it like this:

template <typename DEST, typename SRC>
inline DEST noalias_cast (SRC s)
{
    asm(""::"g"(s));
    union { SRC s; DEST d; } u = {s};
    return (u.d);
}

inline void SPoint::operator= (const SPoint& v)
{
    *noalias_cast<uint32_t*>(&x) =
        *noalias_cast<const uint32_t*>(&v.x);
}

This works just fine, except in some cases when optimization is turned
on. The optimizer doesn't see what I'm doing, and feels free to
rearrange instructions all around it, instantiating points in the wrong
place and reading from uninitialized memory. Next I tried an explicit
asm touch, just like the one in noalias_cast, and for the same reasons:

inline void SPoint::operator= (const SPoint& v)
{
    asm (""::"m"(x),"m"(y));
    asm (""::"m"(v.x),"m"(v.y));
    *noalias_cast<uint32_t*>(&x) =
        *noalias_cast<const uint32_t*>(&v.x);
    asm ("":"=m"(x),"=m"(y));
}

This does help in the majority of cases, but sometimes the optimizer
still screws things up.

One possibility, of course, is to combine the whole thing into a single
asm statement. Unfortunately, it does not work in the general case; I
have tuples of various sizes of up to eight elements that get the above
treatment (see
http://ustl.svn.sourceforge.net/viewvc/ustl/trunk/utuple.h?revision=445&view=markup)
and the assembler runs out of parameters. Each parameter, even "m"s,
count toward register allocation limit, and if you have more than eight
arguments, the asm block fails to compile. (That's probably a bug)

So my question is, how can I resolve this problem?

Is there some way to tell the optimizer to keep the asm blocks together
and not to interleave anything between them? volatile doesn't do it.

Is there some other way to tie an aliased pointer to the target variable
so that the compiler would know what's being accessed?

Or any other ideas?
-- 
Mike
msharov@xxxxxxxxxxxxxxxxxxxxx

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux