On Fri, Dec 02, 2011 at 12:46:25PM -0500, Christoph Hellwig wrote: > On a sufficiently corrupt filesystem walking the btree nodes might hit the > same node node again, which currently will deadlock. Use a recursion > counter to avoid the direct deadlock and let them normal loop detection > (two bad nodes and out) do its work. This is how repair behaved before > we added the lock when implementing buffer prefetching. > > Reported-by: Arkadiusz Mi??kiewicz <arekm@xxxxxxxx> > Tested-by: Arkadiusz Mi??kiewicz <arekm@xxxxxxxx> > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > > Index: xfsprogs-dev/include/libxfs.h > =================================================================== > --- xfsprogs-dev.orig/include/libxfs.h 2011-11-22 22:28:23.000000000 +0000 > +++ xfsprogs-dev/include/libxfs.h 2011-11-22 22:34:27.000000000 +0000 > @@ -226,6 +226,8 @@ typedef struct xfs_buf { > unsigned b_bcount; > dev_t b_dev; > pthread_mutex_t b_lock; > + pthread_t b_holder; > + unsigned int b_recur; > void *b_fsprivate; > void *b_fsprivate2; > void *b_fsprivate3; > Index: xfsprogs-dev/libxfs/rdwr.c > =================================================================== > --- xfsprogs-dev.orig/libxfs/rdwr.c 2011-11-22 22:28:23.000000000 +0000 > +++ xfsprogs-dev/libxfs/rdwr.c 2011-11-22 22:40:01.000000000 +0000 > @@ -342,6 +342,8 @@ libxfs_initbuf(xfs_buf_t *bp, dev_t devi > list_head_init(&bp->b_lock_list); > #endif > pthread_mutex_init(&bp->b_lock, NULL); > + bp->b_holder = 0; > + bp->b_recur = 0; > } > > xfs_buf_t * > @@ -410,18 +412,24 @@ libxfs_getbuf_flags(dev_t device, xfs_da > return NULL; > > if (use_xfs_buf_lock) { > - if (flags & LIBXFS_GETBUF_TRYLOCK) { > - int ret; > + int ret; > > - ret = pthread_mutex_trylock(&bp->b_lock); > - if (ret) { > - ASSERT(ret == EAGAIN); > - cache_node_put(libxfs_bcache, (struct cache_node *)bp); > - return NULL; > + ret = pthread_mutex_trylock(&bp->b_lock); > + if (ret) { > + ASSERT(ret == EAGAIN); > + if (flags & LIBXFS_GETBUF_TRYLOCK) > + goto out_put; > + > + if (pthread_equal(bp->b_holder, pthread_self())) { > + fprintf(stderr, > + _("recursive buffer locking detected\n")); "Warning: recursive buffer locking @ bno %lld detected" might be more informative, especially to do with the severity of the issue. > + bp->b_recur++; > + } else { > + pthread_mutex_lock(&bp->b_lock); > } > - } else { > - pthread_mutex_lock(&bp->b_lock); > } > + > + bp->b_holder = pthread_self(); That should probably only be written in the branch where the lock is taken not every time through here. Also, it might be worth commenting that the only reason there isn't a race checking bp->b_holder without holding a lock is that the holder is initialised to zero and cleared before the buffer lock is dropped so that when a concurrent lookup fails the value of b_holder will never match the failed thread ID. Otherwise, looks good. Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs