Re: Orangefs ABI documentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 14, 2016 at 05:31:10PM -0500, Mike Marshall wrote:
> I added the list_del...
> 
> Everything is very resilient, I killed
> the client-core over and over while dbench
> was running at the same time as  ls -R
> was running, and the client-core always
> restarted... until finally, it didn't. I guess
> related to the state of just what was going on
> at the time... Hit the WARN_ON in service_operation,
> and then oopsed on the orangefs_bufmap_put
> down at the end of wait_for_direct_io...

Bloody hell...  I think I see what's going on, and presumably the newer
slot allocator would fix that.  Look: closing control device (== daemon
death) checks if we have a bufmap installed and drops a reference to
it in that case.  The reason why it's conditional is that we might have
not gotten around to installing one (it's done via ioctl on control
device).  But ->release() does *NOT* wait for all references to go away!
In other words, it's possible to restart the daemon while the old bufmap
is still there.  Then have it killed after it has opened control devices
and before the old bufmap has run down.  For ->release() it looks like
we *have* gotten around to installing bufmap, and need the reference dropped.
In reality, the reference acquired when we were installing that one has
already been dropped, so we get double put.  With expected results...

If below ends up fixing the symptoms, analysis above has a good chance to
be correct.  This is no way to wait for rundown, of course - I'm not
suggesting it as the solution, just as a way to narrow down what's going
on.

Incidentally, could you fold the list_del() part into offending commit
(orangefs: delay freeing slot until cancel completes) and repush your
for-next?

diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c
index 6a7df12..630246d 100644
--- a/fs/orangefs/devorangefs-req.c
+++ b/fs/orangefs/devorangefs-req.c
@@ -529,6 +529,9 @@ static int orangefs_devreq_release(struct inode *inode, struct file *file)
 	purge_inprogress_ops();
 	gossip_debug(GOSSIP_DEV_DEBUG,
 		     "pvfs2-client-core: device close complete\n");
+	/* VERY CRUDE, NOT FOR MERGE */
+	while (orangefs_get_bufmap_init())
+		schedule_timeout(HZ);
 	open_access_count = 0;
 	mutex_unlock(&devreq_mutex);
 	return 0;
diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index 41f8bb1f..1e28555 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -261,6 +261,7 @@ static inline void set_op_state_purged(struct orangefs_kernel_op_s *op)
 {
 	spin_lock(&op->lock);
 	if (unlikely(op_is_cancel(op))) {
+		list_del(&op->list);
 		spin_unlock(&op->lock);
 		put_cancel(op);
 	} else {
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux