Re: Orangefs ABI documentation

Mike Marshall <hubcap@xxxxxxxxxxxx> · Mon, 15 Feb 2016 12:46:51 -0500

I pushed the list_del up to the kernel.org for-next branch...

And I've been running tests with the CRUDE bandaid... weird
results...

No oopses, no WARN_ONs... I was running dbench and ls -R
or find and kill-minus-nining different ones of them with no
perceived resulting problems, so I moved on to signalling
the client-core to abort... it restarted numerous times,
and then stuff wedged up differently than I've seen before.

Usually I kill the client-core and it comes back (gets restarted)
as seen by the different PID:

# ps -ef | grep pvfs
root      1292  1185  7 11:39 ?        00:00:01 pvfs2-client-core
--child -a 60000 -n 60000 --logtype file -L /var/log/client.log
# kill -6 1292
# ps -ef | grep pvfs
root      1299  1185  8 11:40 ?        00:00:00 pvfs2-client-core
--child -a 60000 -n 60000 --logtype file -L /var/log/client.log

Until once, it didn't die, and the gorked up unkillable left-over thing's
argv[0] (or wherever this string gets scraped from) was goofy:

# ps -ef | grep pvfs
root      1324  1185  1 11:41 ?        00:00:02 pvfs2-client-core
--child -a 60000 -n 60000 --logtype file -L /var/log/client.log
[root@be1 hubcap]# kill -6 1324
[root@be1 hubcap]# ps -ef | grep pvfs
root      1324  1185  2 11:41 ?        00:00:05 [pvfs2-client-co]

The virtual host was pretty wedged up after that, I couldn't look
at anything interesting, and got a bunch of terminal windows hung
trying:

# strace -f -p 1324
Process 1324 attached
^C

^C^C
                                     .
                     ls -R's output was flowing out here
/pvfsmnt/tdir/z_really_long_disgustingly_long_super_long_file_name52
/pvfsmnt/tdir/z_really_long_disgustingly_long_super_long_file_name53

^C^C^C

[root@logtruck hubcap]# ssh be1
root@be1's password:
Last login: Mon Feb 15 11:33:42 2016 from logtruck.clemson.edu
[root@be1 ~]# df

I still had one functioning window, and looked at dmesg from there,
nothing interesting there... a couple of expected tag WARNINGS while I was
killing finds and dbenches... ioctls that happened during the
successful restarts of the client-core...

[  809.520966] client-core: opening device
[  809.521031] pvfs2-client-core: open device complete (ret = 0)
[  809.521050] dispatch_ioctl_command: client debug mask has been been
received :0: :0:
[  809.521068] dispatch_ioctl_command: client debug array string has
been received.
[  809.521070] orangefs_prepare_debugfs_help_string: start
[  809.521071] orangefs_prepare_cdm_array: start
[  809.521104] orangefs_prepare_cdm_array: rc:50:
[  809.521106] orangefs_prepare_debugfs_help_string: cdm_element_count:50:
[  809.521239] debug_mask_to_string: start
[  809.521242] debug_mask_to_string: string:none:
[  809.521243] orangefs_client_debug_init: start
[  809.521249] orangefs_client_debug_init: rc:0:
[  809.566652] dispatch_ioctl_command: got ORANGEFS_DEV_REMOUNT_ALL
[  809.566667] dispatch_ioctl_command: priority remount in progress
[  809.566668] dispatch_ioctl_command: priority remount complete
[  812.454255] orangefs_debug_open: orangefs_debug_disabled: 0
[  812.454294] orangefs_debug_open: rc: 0
[  812.454320] orangefs_debug_write: kernel-debug
[  812.454323] debug_string_to_mask: start
[  896.410522] WARNING: No one's waiting for tag 15612
[ 1085.339948] WARNING: No one's waiting for tag 127943
[ 1146.820485] orangefs: please confirm that pvfs2-client daemon is running.
[ 1146.820488] fs/orangefs/dir.c line 264: orangefs_readdir:
orangefs_readdir_index_get() failure (-5)
[ 1146.866812] dispatch_ioctl_command: client debug mask has been been
received :0: :0:
[ 1146.866834] dispatch_ioctl_command: client debug array string has
been received.
[ 1175.906800] dispatch_ioctl_command: client debug mask has been been
received :0: :0:
[ 1175.906817] dispatch_ioctl_command: client debug array string has
been received.
[ 1223.915862] dispatch_ioctl_command: client debug mask has been been
received :0: :0:
[ 1223.915880] dispatch_ioctl_command: client debug array string has
been received.
[ 1274.458852] dispatch_ioctl_command: client debug mask has been been
received :0: :0:
[ 1274.458870] dispatch_ioctl_command: client debug array string has
been received.
[root@be1 hubcap]#

ps aux shows every process' state as S except for 1324 which is
racking up time:

[hubcap@be1 ~]$ ps aux | grep pvfs2-client
root      1324 92.4  0.0      0     0 ?        R    11:41  46:29
[pvfs2-client-co]
[hubcap@be1 ~]$ ps aux | grep pvfs2-client
root      1324 92.4  0.0      0     0 ?        R    11:41  46:30
[pvfs2-client-co]

I'll virsh destroy this thing now <g>...

-Mike

On Sun, Feb 14, 2016 at 6:43 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Sun, Feb 14, 2016 at 05:31:10PM -0500, Mike Marshall wrote:
>> I added the list_del...
>>
>> Everything is very resilient, I killed
>> the client-core over and over while dbench
>> was running at the same time as  ls -R
>> was running, and the client-core always
>> restarted... until finally, it didn't. I guess
>> related to the state of just what was going on
>> at the time... Hit the WARN_ON in service_operation,
>> and then oopsed on the orangefs_bufmap_put
>> down at the end of wait_for_direct_io...
>
> Bloody hell...  I think I see what's going on, and presumably the newer
> slot allocator would fix that.  Look: closing control device (== daemon
> death) checks if we have a bufmap installed and drops a reference to
> it in that case.  The reason why it's conditional is that we might have
> not gotten around to installing one (it's done via ioctl on control
> device).  But ->release() does *NOT* wait for all references to go away!
> In other words, it's possible to restart the daemon while the old bufmap
> is still there.  Then have it killed after it has opened control devices
> and before the old bufmap has run down.  For ->release() it looks like
> we *have* gotten around to installing bufmap, and need the reference dropped.
> In reality, the reference acquired when we were installing that one has
> already been dropped, so we get double put.  With expected results...
>
> If below ends up fixing the symptoms, analysis above has a good chance to
> be correct.  This is no way to wait for rundown, of course - I'm not
> suggesting it as the solution, just as a way to narrow down what's going
> on.
>
> Incidentally, could you fold the list_del() part into offending commit
> (orangefs: delay freeing slot until cancel completes) and repush your
> for-next?
>
> diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c
> index 6a7df12..630246d 100644
> --- a/fs/orangefs/devorangefs-req.c
> +++ b/fs/orangefs/devorangefs-req.c
> @@ -529,6 +529,9 @@ static int orangefs_devreq_release(struct inode *inode, struct file *file)
>         purge_inprogress_ops();
>         gossip_debug(GOSSIP_DEV_DEBUG,
>                      "pvfs2-client-core: device close complete\n");
> +       /* VERY CRUDE, NOT FOR MERGE */
> +       while (orangefs_get_bufmap_init())
> +               schedule_timeout(HZ);
>         open_access_count = 0;
>         mutex_unlock(&devreq_mutex);
>         return 0;
> diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
> index 41f8bb1f..1e28555 100644
> --- a/fs/orangefs/orangefs-kernel.h
> +++ b/fs/orangefs/orangefs-kernel.h
> @@ -261,6 +261,7 @@ static inline void set_op_state_purged(struct orangefs_kernel_op_s *op)
>  {
>         spin_lock(&op->lock);
>         if (unlikely(op_is_cancel(op))) {
> +               list_del(&op->list);
>                 spin_unlock(&op->lock);
>                 put_cancel(op);
>         } else {
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html