Unexpected value-at for NULL'd pointer with pthreads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey all, this is my first post to this list so please forgive (and
correct me on) any faux pas with my question below.  I'm hoping I'm
either missing something simple or someone can explain in deeper
detail the inner workings of what gcc is doing to create my dilemma
below.

I have a a program which creates a List (my struct) that contains an
array of Buffer (my struct) pointers along with some counters and
controlling pieces (reference counters, pthread_mutex_t locks, and
pthread_cond_t conditional variables).

I have a function called list__remove() which removes a given buffer
from the List by shifting pointers down and then free()-ing the
pointer's value and the setting the pointer to NULL.  All of this is
done under the protection of mutexes and proper conditions/broadcasts
to readers and writers.  In other words, writers always wait for
in-progress readers to finish, new readers always wait for pending
writers to finish, and only one writer may affect the list at a given
time.  I've tested this with hundreds of readers and several writers
and I believe it's solid.  Here is the function (with printf()s to
help explain the output afterward):

int list__remove(List *list, Buffer **buf) {
  /* Update the pending writers in case others are waiting. */
  pthread_mutex_lock(&list->lock);
  printf("%d : incrementing pending writers\n", pthread_self());
  list->pending_writers++;
  while(list->ref_count != 0)
    pthread_cond_wait(&list->writer_condition, &list->lock);
  printf("%d : decrementing pending writers\n", pthread_self());
  list->pending_writers--;

  /* If the *buf is NULL then another thread already removed it from
the list.  Signal, unlock, and leave happy. */
  if (*buf == NULL) {
    printf("%d : *buf is null so I'm leaving\n", pthread_self());
    if(list->pending_writers > 0) {
      pthread_cond_broadcast(&list->writer_condition);
    } else {
      pthread_cond_broadcast(&list->reader_condition);
    }
    pthread_mutex_unlock(&list->lock);
    return E_OK;
  }
  printf("%d : going to remove buf id %d for ptr %d  -- ref_count %d,
lock_id %d\n", pthread_self(), (*buf)->id, *buf, (*buf)->ref_count,
(*buf)->lock_id);

  /* At this point we own the list lock.  Try to victimize the buffer
so we can remove it. */
  int rv = buffer__victimize(*buf);
  if (rv != 0)
    show_err("The list__remove function received an error when trying
to victimize the buffer (%d).\n", rv);

  /* We now have the list locked and the buffer fully victimized and
locked.  Store the lock_id and begin searching for index. */
  int low = 0, high = list->count, mid = 0;
  rv = E_OK;
  lockid_t lock_id = (*buf)->lock_id;
  for(;;) {
    // Reset mid and begin testing.  Start with boundary testing to
break if we're done.
    mid = (low + high)/2;
    if (high < low || low > high || mid >= list->count) {
      rv = E_BUFFER_NOT_FOUND;
      break;
    }

    // If the pool[mid] ID matches, we found the right index.
Collapse downward, update the list, null out *buf, and leave.
    if (list->pool[mid]->id == (*buf)->id) {
      for (int i=mid; i<list->count - 1; i++)
        list->pool[i] = list->pool[i+1];
      list->count--;
      free(*buf);
      *buf = NULL;
      lock__release(lock_id);
      printf("%d : free() and null done\n", pthread_self());
      break;
    }

    // If our current pool[mid] ID is too high, update low.
    if (list->pool[mid]->id < (*buf)->id) {
      low = mid + 1;
      continue;
    }

    // If the pool[mid] ID is too low, we need to update high.
    if (list->pool[mid]->id > (*buf)->id) {
      high = mid - 1;
      continue;
    }
  }

  /* Release the lock and broadcast to readers or writers as needed.. */
  if(list->pending_writers > 0) {
    pthread_cond_broadcast(&list->writer_condition);
  } else {
    pthread_cond_broadcast(&list->reader_condition);
  }
  pthread_mutex_unlock(&list->lock);
  return rv;
}


The problem I'm experiencing occurs when two threads attempt to remove
the same buffer in close succession.  To emulate this I made 0 reader
threads and 2 writer threads start back-to-back and told them both to
remove buffer ID 777.  I expect the first thread to arrive at
list__remove() to move past the predicate at the top of the function
and block the subsequent thread.  It should then find my buffer,
remove it, and set the *buf pointer to NULL after free()-ing it.  I
then expect my other thread to unblock, see that *buf == NULL, and
leave because of the if() statement it encounters.  Instead, I get the
following flow (the number on the left is the pthread_self()
identifier):

-1287739648 : starting search
-1296132352 : starting search
-1287739648 : locked buffer
-1287739648 : buffer wants a delta of 1
-1287739648 : unlocking buffer
-1287739648 : done searching, rv is 0
-1287739648 : locked buffer
-1287739648 : updating ref (rv is 0, vict is 0, count is 1)
-1287739648 : buffer wants a delta of -1
-1287739648 : unlocking buffer
-1287739648 : incrementing pending writers
-1296132352 : locked buffer
-1296132352 : buffer wants a delta of 1
-1296132352 : unlocking buffer
-1296132352 : done searching, rv is 0
-1296132352 : locked buffer
-1296132352 : updating ref (rv is 0, vict is 0, count is 1)
-1296132352 : buffer wants a delta of -1
-1296132352 : unlocking buffer
-1287739648 : decrementing pending writers
-1287739648 : going to remove buf id 777 for pointer 16396688  --
ref_count 0, lock_id 777
-1287739648 : locked buffer
-1287739648 : marking buffer victimized
-1287739648 : pool shifted; free() and null assignment done
-1287739648 : list__remove gave return code 0
-1287739648 : removed a buffer
-1296132352 : incrementing pending writers
-1296132352 : decrementing pending writers
-1296132352 : going to remove buf id 0 for pointer 16396688  --
ref_count 0, lock_id 777
-1296132352 : locked buffer
-1296132352 : marking buffer victimized
-1296132352 : list__remove gave return code 120

My locking and conditions/signals appear to be keeping thread
synchronization in check.  So how on Earth can thread 1, while under
the protection of a mutex, set *buf = NULL and then thread 2, using
the same mutex, doesn't see that?  Thread 1 is clearly affecting *buf
because thread 2 is reading (*buf)->id as 0 when it used to be 777.
Why doesn't thread 2 see *buf == NULL even after we assert() it in
thread 1?

My apologies if this is a simple error, I've spent many, many hours
searching for help and I'm just not getting it.

Thanks in advance for any help!



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux