I pushed the list_del up to the kernel.org for-next branch... And I've been running tests with the CRUDE bandaid... weird results... No oopses, no WARN_ONs... I was running dbench and ls -R or find and kill-minus-nining different ones of them with no perceived resulting problems, so I moved on to signalling the client-core to abort... it restarted numerous times, and then stuff wedged up differently than I've seen before. Usually I kill the client-core and it comes back (gets restarted) as seen by the different PID: # ps -ef | grep pvfs root 1292 1185 7 11:39 ? 00:00:01 pvfs2-client-core --child -a 60000 -n 60000 --logtype file -L /var/log/client.log # kill -6 1292 # ps -ef | grep pvfs root 1299 1185 8 11:40 ? 00:00:00 pvfs2-client-core --child -a 60000 -n 60000 --logtype file -L /var/log/client.log Until once, it didn't die, and the gorked up unkillable left-over thing's argv[0] (or wherever this string gets scraped from) was goofy: # ps -ef | grep pvfs root 1324 1185 1 11:41 ? 00:00:02 pvfs2-client-core --child -a 60000 -n 60000 --logtype file -L /var/log/client.log [root@be1 hubcap]# kill -6 1324 [root@be1 hubcap]# ps -ef | grep pvfs root 1324 1185 2 11:41 ? 00:00:05 [pvfs2-client-co] The virtual host was pretty wedged up after that, I couldn't look at anything interesting, and got a bunch of terminal windows hung trying: # strace -f -p 1324 Process 1324 attached ^C ^C^C . ls -R's output was flowing out here /pvfsmnt/tdir/z_really_long_disgustingly_long_super_long_file_name52 /pvfsmnt/tdir/z_really_long_disgustingly_long_super_long_file_name53 ^C^C^C [root@logtruck hubcap]# ssh be1 root@be1's password: Last login: Mon Feb 15 11:33:42 2016 from logtruck.clemson.edu [root@be1 ~]# df I still had one functioning window, and looked at dmesg from there, nothing interesting there... a couple of expected tag WARNINGS while I was killing finds and dbenches... ioctls that happened during the successful restarts of the client-core... [ 809.520966] client-core: opening device [ 809.521031] pvfs2-client-core: open device complete (ret = 0) [ 809.521050] dispatch_ioctl_command: client debug mask has been been received :0: :0: [ 809.521068] dispatch_ioctl_command: client debug array string has been received. [ 809.521070] orangefs_prepare_debugfs_help_string: start [ 809.521071] orangefs_prepare_cdm_array: start [ 809.521104] orangefs_prepare_cdm_array: rc:50: [ 809.521106] orangefs_prepare_debugfs_help_string: cdm_element_count:50: [ 809.521239] debug_mask_to_string: start [ 809.521242] debug_mask_to_string: string:none: [ 809.521243] orangefs_client_debug_init: start [ 809.521249] orangefs_client_debug_init: rc:0: [ 809.566652] dispatch_ioctl_command: got ORANGEFS_DEV_REMOUNT_ALL [ 809.566667] dispatch_ioctl_command: priority remount in progress [ 809.566668] dispatch_ioctl_command: priority remount complete [ 812.454255] orangefs_debug_open: orangefs_debug_disabled: 0 [ 812.454294] orangefs_debug_open: rc: 0 [ 812.454320] orangefs_debug_write: kernel-debug [ 812.454323] debug_string_to_mask: start [ 896.410522] WARNING: No one's waiting for tag 15612 [ 1085.339948] WARNING: No one's waiting for tag 127943 [ 1146.820485] orangefs: please confirm that pvfs2-client daemon is running. [ 1146.820488] fs/orangefs/dir.c line 264: orangefs_readdir: orangefs_readdir_index_get() failure (-5) [ 1146.866812] dispatch_ioctl_command: client debug mask has been been received :0: :0: [ 1146.866834] dispatch_ioctl_command: client debug array string has been received. [ 1175.906800] dispatch_ioctl_command: client debug mask has been been received :0: :0: [ 1175.906817] dispatch_ioctl_command: client debug array string has been received. [ 1223.915862] dispatch_ioctl_command: client debug mask has been been received :0: :0: [ 1223.915880] dispatch_ioctl_command: client debug array string has been received. [ 1274.458852] dispatch_ioctl_command: client debug mask has been been received :0: :0: [ 1274.458870] dispatch_ioctl_command: client debug array string has been received. [root@be1 hubcap]# ps aux shows every process' state as S except for 1324 which is racking up time: [hubcap@be1 ~]$ ps aux | grep pvfs2-client root 1324 92.4 0.0 0 0 ? R 11:41 46:29 [pvfs2-client-co] [hubcap@be1 ~]$ ps aux | grep pvfs2-client root 1324 92.4 0.0 0 0 ? R 11:41 46:30 [pvfs2-client-co] I'll virsh destroy this thing now <g>... -Mike On Sun, Feb 14, 2016 at 6:43 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Sun, Feb 14, 2016 at 05:31:10PM -0500, Mike Marshall wrote: >> I added the list_del... >> >> Everything is very resilient, I killed >> the client-core over and over while dbench >> was running at the same time as ls -R >> was running, and the client-core always >> restarted... until finally, it didn't. I guess >> related to the state of just what was going on >> at the time... Hit the WARN_ON in service_operation, >> and then oopsed on the orangefs_bufmap_put >> down at the end of wait_for_direct_io... > > Bloody hell... I think I see what's going on, and presumably the newer > slot allocator would fix that. Look: closing control device (== daemon > death) checks if we have a bufmap installed and drops a reference to > it in that case. The reason why it's conditional is that we might have > not gotten around to installing one (it's done via ioctl on control > device). But ->release() does *NOT* wait for all references to go away! > In other words, it's possible to restart the daemon while the old bufmap > is still there. Then have it killed after it has opened control devices > and before the old bufmap has run down. For ->release() it looks like > we *have* gotten around to installing bufmap, and need the reference dropped. > In reality, the reference acquired when we were installing that one has > already been dropped, so we get double put. With expected results... > > If below ends up fixing the symptoms, analysis above has a good chance to > be correct. This is no way to wait for rundown, of course - I'm not > suggesting it as the solution, just as a way to narrow down what's going > on. > > Incidentally, could you fold the list_del() part into offending commit > (orangefs: delay freeing slot until cancel completes) and repush your > for-next? > > diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c > index 6a7df12..630246d 100644 > --- a/fs/orangefs/devorangefs-req.c > +++ b/fs/orangefs/devorangefs-req.c > @@ -529,6 +529,9 @@ static int orangefs_devreq_release(struct inode *inode, struct file *file) > purge_inprogress_ops(); > gossip_debug(GOSSIP_DEV_DEBUG, > "pvfs2-client-core: device close complete\n"); > + /* VERY CRUDE, NOT FOR MERGE */ > + while (orangefs_get_bufmap_init()) > + schedule_timeout(HZ); > open_access_count = 0; > mutex_unlock(&devreq_mutex); > return 0; > diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h > index 41f8bb1f..1e28555 100644 > --- a/fs/orangefs/orangefs-kernel.h > +++ b/fs/orangefs/orangefs-kernel.h > @@ -261,6 +261,7 @@ static inline void set_op_state_purged(struct orangefs_kernel_op_s *op) > { > spin_lock(&op->lock); > if (unlikely(op_is_cancel(op))) { > + list_del(&op->list); > spin_unlock(&op->lock); > put_cancel(op); > } else { -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html