Semaphors Problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

I am using a simple synchronization between threads, one compute, the
second send dependency to other processors, and the third thread
receives dependency from other processors, I am using MPI to
send/receive between processors, and in each process, these three
threads to do the inner computation of the parts done by each one,

The problem is in the threads synchronization, I use semaphores, I
wait on the sending thread till there is a dependency that needs to be
sent, and on the receiving thread till the computation require a
dependency and need to check the MPI to receive it, I have three
semaphores, one between the computation and sending, one between the
computation and receiving, and one between the receiving and the
computation to notify that it received something to be checked,

I initialize the three semaphores as :

	
if (sem_init(&icSem, 0, 0) != 0 ) {
		printf ("Error Initializing Semephore icSem, Exiting\n");

		getSemError ();

 	return;
}

and post like that:


	
 if (sem_post(&rcSem) != 0) {
		printf ("Error post Semephore rcSem, Exiting\n");

		getSemError ();

 	return NULL;


 }
and wait like this:

	
	  if (sem_wait(&dsSem) != 0) {
			printf ("Error waiting on Semephore dsSem, Exiting\n");

			getSemError ();

	  	return NULL;

	  }

in the getSemError () I read the error and type the message as in the
man files,

then I get this in the run time:
The call was interrupted by a signal handler. when I wait on the
computation thread for the receiving thread to receive a dependency
that I need to resume computing,and same problem happen when I wait in
the sending thread on another semaphore, it fails to wait with the
same error, practically on all sem_wait calls,

and when I read the value of the semaphore, it sometimes exceeds 1, I
read values like 2 and 3 as well, while on every sem_post, there is a
previously called sem_wait that once receives it should work on it,

I tried to loop till sem_wait succeeds, but didn't work, and I noticed
that sem_post increment the semaphore value to higher than 1, I guess
this could be across the processes, not within the same process memory
space only (because I am simulating three mpi processes on the same
machines, but my understanding is that each will have its own memory
space including semaphores values), could this be true,

I just tried running the same program, not simulated on a single
machine, but on a real HPC machine, to test the possibility that
semaphore retain its status across processes, not only across threads
within the same process, and I received the same problem, " The call
was interrupted by a signal handler", on sem_wait, and sem_post
increase the value to over 1,

I used semaphores, because I thought semaphores signals remain in
memory, and order of calls don't affect execution, I previously tried
pthread conditions (pthread_cond_wait / pthread_cond_signal), and it
didn't work, and I learned that if I signal and nothing is waiting,
the signal will disappear, and semaphores solved this problem, as I
can wait either before or after the post signal, and the wait will
block only if the semaphore value is zero, otherwise (i.e. a post was
already received) it will decrement and come back

now, this understanding doesn't seem working either with semaphore,

I am testing that on a single machine: 2.6.20-1.2944.fc6 #1 SMP Tue
Apr 10 17:27:49 EDT 2007 i686 i686 i386 GNU/Linux

and the gcc version is: gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-51)
the MPICH library details are:
Version:           1.0.4-rc1
Device:            ch3:sock
Configure Options: '-prefix=/home/mhelal/mpich2       -install'
'--enable-sharedlibs=gcc' '--enable-mpe'

The HPC machine is:
2.6.5-7.199-sn2 #1 SMP Thu Aug 18 09:17:57 UTC 2005 ia64 ia64 ia64 GNU/Linux

gcc (GCC) 3.3.3 (SuSE Linux)

but I acually use here icc 8.1

and most probably the mpi is mpiBLAST 1.4.0
I appreciate any help in this issue, as it has taken so much time
from me, and I am afraid I might be in a wrong direction completely,

I appreciate a redirection to another forum (preferably by the
developers of the semaphores library for ANSI C in Linux and/or
pthreads library) or other concurrency gurus,

Thank you very much,
Manal

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux