Re: server-side extension in c++

Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> · Wed, 02 Jun 2010 20:36:00 +0800

On 02/06/10 19:17, Peter Geoghegan wrote:

>> Similarly, calling Pg code that may use Pg's error handling from within
>> C++ is unsafe. It should be OK if you know for absolute certain that the
>> C++ call tree in question only has plain-old-data (POD) structs and
>> simple variables on the stack, but even then it requires caution. C++
>> code that uses Pg calls can't do anything it couldn't do if you were
>> using 'goto' and labels in each involved function, but additionally has
>> to worry about returning and passing non-POD objects between functions
>> in a call chain by value, as a longjmp may result in dtors not being
>> properly called.
> 
> 
> Really? That seems like an *incredibly* arduous requirement.
> Intuitively, I find it difficult to believe. After all, even though
> using longjmp in C++ code is a fast track to undefined behaviour, I
> would have imagined that doing so in an isolated C module with a well
> defined interface, called from C++ would be safe.

Not necessarily. It's only safe if setjmp/longjmp calls occur only
within the C code without "breaking" call paths involving C++.

This is ok:

   [ C ]
   entrypoint()
   callIntoCppCode()
   [ C++ ]
   someCalls()
   callIntoCCode()
   [ C ]
   setjmp()
   doSomeStuff()
   longjmp()

This is really, really not:

   [ C ]
   entrypoint()
   setjmp()        <----
   callIntoCppCode()
   [ C++ ]
   someCalls()
   callIntoCCode()
   [ C ]
   doSomeStuff()
   longjmp()

See the attached demo (pop all files in the same directory then run "make").

> I would have
> imagined that ultimately, the call to the Pg C function must return,
> and therefore cannot affect stack unwinding within the C++ part of the
> program.

That's the whole point; a longjmp breaks the call chain, and the
guarantee that eventually the stack will unwind as functions return.

It's OK if you setjmp(a), do some work, setjmp(b), longjmp(a), do some
work, longjmp(b), return.

My understanding, which is likely imperfect, is that Pg's error handling
does NOT guarantee that, ie it's quite possible that a function may call
longjmp() without preparing any jmp_env to "jump back to" and therefore
will never return.

> To invoke a reductio ad absurdum argument, if this were the case,
> calling C functions from C++ would be widely considered a dangerous
> thing to do, which it is not.

If those C functions use setjmp/longjmp, it *is* a dangerous thing to
do. Most libraries that use setjmp/longjump in ways that may affect
calling code DO document this, and it's expected that the user of the
library will know what that entails.

If the library uses setjmp/longjmp entirely internally, so that it never

http://stackoverflow.com/questions/1376085/c-safe-to-use-longjmp-and-setjmp

-- 
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/
#include <iostream>

/* Extern reference to the longjmp-using code we'll call */
extern "C" {
	void calledFromCppCode();
};

/*
 * A typical reference-counted object using embedded-style
 * refcounting, and a typical counted reference to it.
 */

class RefCountedObject {
  int refCount;
public:
  RefCountedObject() : refCount(0) {
  }
  ~RefCountedObject() {
    std::cerr << "RefCountedObject destroyed with refcount==" << refCount << std::endl;
  }
  int getRefCount() {
    return refCount;
  }
protected:
  friend class Reference;
  void incRef() {
    std::cerr << "Incrementing refcount from " << refCount << " to " << (refCount+1) << std::endl;
    refCount++;
  }
  void decRef() {
    std::cerr << "Decrementing refcount from " << refCount << " to " << (refCount-1) << std::endl;
    refCount--;
  }
};

class Reference {
  RefCountedObject& ref;
public:
  Reference(RefCountedObject& refObj) : ref(refObj) {
    ref.incRef();
  }
  ~Reference() {
    ref.decRef();
  }
  RefCountedObject& operator*() {
    return ref;
  }
  RefCountedObject * get() {
    return &ref;
  }
};

static RefCountedObject fred = RefCountedObject();

extern "C" void cppCall() {
  std::cerr << "Entering C++ module, refcount is " << fred.getRefCount() << '\n';
  if (1) {
    /* Use of stack-based object with dtor - in this case, a reference - somewhere where should be scope-unwound */
    Reference ref( fred );
    /* and call into the C code that'll end up doing a longjmp() while that stack object still exists */
    std::cerr << "Calling back into C, refcount is " << fred.getRefCount() << '\n';
    calledFromCppCode();
    std::cerr << "Returning from call into C, refcount is " << fred.getRefCount() << '\n';
  }
  std::cerr << "Scope with ref exited, refcount is " << fred.getRefCount() << '\n';
}

extern "C" int cppSanityCheck() {
  /* Refcount should be zero if everything is OK and we haven't leaked any references,
   * as there's no way to go back to where we started from. */
  int refCount = fred.getRefCount();
  std::cerr << "Final ref count is: " << refCount << std::endl;
  return refCount;
}
#include <stdlib.h>
#include <stdio.h>
#include <setjmp.h>

extern int cppSanityCheck();
extern void cppCall();

static jmp_buf jumpPoint;

static void entryPoint() {
#if defined(UNSAFE_JUMP)
  /* this is unsafe. We'll setjmp here, then call into C++ which will
   * create a non-POD object on the stack then call back into our C module
   * which will longjmp(). We'll land up back here without having unwound
   * the C++ stack, and with no way to go back.
   */
  if (setjmp(jumpPoint) != 0) {
    /* just returned from our jump. */
    return;
  }
#endif
  /* this would generally be a call to a dlopen()ed function pointer to an extern "C" function in practice ...*/
  cppCall();
}

static void doSomeStuff() {
  /* While doing some work, we hit an error and longjmp() */
  longjmp(jumpPoint, 1);
}

void calledFromCppCode() {
#if !defined(UNSAFE_JUMP)
  /* this is OK, since we won't enter the C++ module again between setjmp() and longjmp() */
  if (setjmp(jumpPoint) != 0) {
    /* Just returned from our jump */
    return;
  }
#endif
  doSomeStuff();
}

int main() {
  entryPoint();
  int ret = cppSanityCheck();
  if (ret != 0) {
    printf("***** About to exit with LEAKED OBJECTS NOT PROPERLY CLEANED UP ****\n");
  }
} 
CARGS=-std=c99 -Wall

all: safe unsafe
	@echo
	@echo invoking "safe"
	@echo
	LD_LIBRARY_PATH=. ./safe
	@echo
	@echo invoking "unsafe"
	@echo
	LD_LIBRARY_PATH=. ./unsafe

unsafe: main.c libcppmodule.so
	gcc $(CARGS) -DUNSAFE_JUMP -o unsafe libcppmodule.so main.c

safe: main.c libcppmodule.so
	gcc $(CARGS) -o safe libcppmodule.so main.c

libcppmodule.so: cppmodule.cpp
	g++ -Wall -fPIC -shared -o libcppmodule.so cppmodule.cpp

clean:
	rm -f libcppmodule.so safe unsafe
-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general