On Sun, Feb 14, 2016 at 05:31:10PM -0500, Mike Marshall wrote: > I added the list_del... > > Everything is very resilient, I killed > the client-core over and over while dbench > was running at the same time as ls -R > was running, and the client-core always > restarted... until finally, it didn't. I guess > related to the state of just what was going on > at the time... Hit the WARN_ON in service_operation, > and then oopsed on the orangefs_bufmap_put > down at the end of wait_for_direct_io... Bloody hell... I think I see what's going on, and presumably the newer slot allocator would fix that. Look: closing control device (== daemon death) checks if we have a bufmap installed and drops a reference to it in that case. The reason why it's conditional is that we might have not gotten around to installing one (it's done via ioctl on control device). But ->release() does *NOT* wait for all references to go away! In other words, it's possible to restart the daemon while the old bufmap is still there. Then have it killed after it has opened control devices and before the old bufmap has run down. For ->release() it looks like we *have* gotten around to installing bufmap, and need the reference dropped. In reality, the reference acquired when we were installing that one has already been dropped, so we get double put. With expected results... If below ends up fixing the symptoms, analysis above has a good chance to be correct. This is no way to wait for rundown, of course - I'm not suggesting it as the solution, just as a way to narrow down what's going on. Incidentally, could you fold the list_del() part into offending commit (orangefs: delay freeing slot until cancel completes) and repush your for-next? diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c index 6a7df12..630246d 100644 --- a/fs/orangefs/devorangefs-req.c +++ b/fs/orangefs/devorangefs-req.c @@ -529,6 +529,9 @@ static int orangefs_devreq_release(struct inode *inode, struct file *file) purge_inprogress_ops(); gossip_debug(GOSSIP_DEV_DEBUG, "pvfs2-client-core: device close complete\n"); + /* VERY CRUDE, NOT FOR MERGE */ + while (orangefs_get_bufmap_init()) + schedule_timeout(HZ); open_access_count = 0; mutex_unlock(&devreq_mutex); return 0; diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h index 41f8bb1f..1e28555 100644 --- a/fs/orangefs/orangefs-kernel.h +++ b/fs/orangefs/orangefs-kernel.h @@ -261,6 +261,7 @@ static inline void set_op_state_purged(struct orangefs_kernel_op_s *op) { spin_lock(&op->lock); if (unlikely(op_is_cancel(op))) { + list_del(&op->list); spin_unlock(&op->lock); put_cancel(op); } else { -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html