Hi, Still trying to solve the problems porting libstdc++ in GCC 4.4.0 to interix I notice that when I compile the test code below with flags -fopenmp -D_GLIBCXX_PARALLEL -O3 it works as expected at a decent speed. But if I omit the _GLIBCXX_PARALLEL flag then the weird behaviour with high CPU kernel times and slow execution occures whenever more than one thread is requested in the program. Are there any libstdc++ gurus out there who knows what these symptoms might mean? Many thanks, Rob ----- Original Message ----- From: "Robert Oeffner" <robert@xxxxxxxxxxx> To: <gcc-help@xxxxxxxxxxx> Sent: Saturday, September 26, 2009 11:20 AM Subject: libstdc++ and openmp problem with GCC4.4.0 port to interix > Hi, > > Probably a long shot but I wonder if anyone would have a useful tip on a > problem porting gcc4.4.0 to interix (a BSD-like OS running on top of the > Windows kernel). > > As libgomp in GCC so far isn't targeting interix I have made some changes to > libgomp in my copy of the GCC 4.4.0 distribution. A new source file was > created, gcc-4.4.0/libgomp/config/posix/interix/proc.c, which is templated > on the existing gcc-4.4.0/libgomp/config/posix/proc.c and > gcc-4.4.0/libgomp/config/posix/mingw32/proc.c in the distribution (see > http://www.oeffner.net/stuff/gcc-4.4.0_interix_changes.zip or > http://www.suacommunity.com/forum/tm.aspx?m=16600 ). With this file and > modifications to GCC configuration files in the distribution I can bootstrap > GCC 4.4.0 to build gcc and g++ compilers on interix. > > The port produces fast code for single threaded running programs. However, > there's a major problem with OpenMP. It's something to do with libstdc++ > that tends to go in overdrive when you request OpenMP to create more than > one thread for the compiled program. When calling string::clear() from > libstdc++ it somehow hogs the CPU with high kernel times and runs orders of > magnitudes slower. The code below demonstrates the problem. It runs fast > when using just one thread but abysmally slow when two or more threads are > present, even though the loop doing the work is actually single threaded and > the other threads remain idle. > Windows Taskmanager shows that execution times is roughly 50% kernel and 50% > user time whenever you run more than one thread. Invoked with a single > thread execution time is just spend in user mode. > > As far as I know releasing and locking data objects is done by the OS on > behalf of a programs request and it's done in kernel mode. Are there > situations where libstdc++ may be confused about idle threads in a program > and then do unnecessary requests for locking and releasing data objects? > > If there is anyone who has a suggestion on what causes these symptoms in my > GCC port that would be greatly appreciated. > > Many thanks, > > Rob > > > #include <iostream> > #include <omp.h> > > using namespace std; > > const long lmax = 50000; > > int main() > { > int nthreads = 1; > cout<<"Enter number of OpenMP threads to create: "; > cin >> nthreads; > omp_set_num_threads(nthreads); > > #pragma omp parallel > { > #pragma omp single > cout << "Doing string stuff with "<<omp_get_num_threads()<<" > thread(s)"<<endl; > } > > time_t start, now; > time( &start ); > > string pairlbl(""); > > for (long m = 0; m< lmax; m++) > { > if ((m % (lmax/20))==0) > cout << "m = "<<m<<endl; > > for (int j=1;j<=2000;j++) > { > pairlbl.clear(); > } > } > > time( &now); > cout<<"\ntime= "<<difftime( now, start )<<" sec\n"; > > return 0; > } > > > > >