Re: g++ 4.2.x x86: code generation for __sync_lock_test_and_set() - builtin

Andrew Haley <aph@xxxxxxxxxx> · Mon, 25 Feb 2008 15:03:27 +0000

Daniel Lohmann wrote:
g++ 4.2.3

Hi,

I have the following code, which uses the new __sync_lock_test_and_set() 
builtin:

class Mutex {
  int locked;
public:
  Mutex() {
    locked = 0;
  }
  void lock();
  void unlock();
};

void Mutex::lock() {
  while( __sync_lock_test_and_set( &locked, 1) == 0 )
    ;
}
void Mutex::unlock() {
  __sync_lock_release( &locked );
}

After compiling with -03 -fomit-frame-pointer, the resulting code for 
the Mutex::lock() method looks as follows:

00000010 <Mutex::lock()>:
  10:    8b 54 24 04              mov    0x4(%esp),%edx
  14:    b8 01 00 00 00           mov    $0x1,%eax
  19:    87 02                    xchg   %eax,(%edx)
  1b:    85 c0                    test   %eax,%eax
  1d:    74 f5                    je     14 <Mutex::lock()+0x4>
  1f:    f3 c3                    repz ret

I am wondering about the repz prefix before the ret. A "do RET until 
Z-Flag is set" obviously does not make sense from the functional point 
of view. So I assume that it actually is a side effects of the repz 
prefix that is exploited here to guarantee "something" with respect to 
instruction reordering, fetching, caching, or ...?

So what exactly is this "something"?
And what exactly could happen under which circumstances if we don't use it?

 Google does not reveal much. If one googles for "repz ret" one gets a 
*load* of hits --  but just because of the fact that "ret" appears 
immediately after "repz" in the alphabetically sorted list of x86 
instructions :-)

If you grep the gcc source you'll find

;; Used by x86_machine_dependent_reorg to avoid penalty on single byte RET
;; instruction Athlon and K8 have.

(define_insn "return_internal_long"
 [(return)
  (unspec [(const_int 0)] UNSPEC_REP)]
 "reload_completed"
 "rep\;ret"
 [(set_attr "length" "1")
  (set_attr "length_immediate" "0")
  (set_attr "prefix_rep" "1")
  (set_attr "modrm" "0")])