Re: Orangefs ABI documentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I added the list_del...

Everything is very resilient, I killed
the client-core over and over while dbench
was running at the same time as  ls -R
was running, and the client-core always
restarted... until finally, it didn't. I guess
related to the state of just what was going on
at the time... Hit the WARN_ON in service_operation,
and then oopsed on the orangefs_bufmap_put
down at the end of wait_for_direct_io...

http://myweb.clemson.edu/~hubcap/after.list_del

-Mike

On Sat, Feb 13, 2016 at 9:56 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Sat, Feb 13, 2016 at 05:47:38PM +0000, Al Viro wrote:
>> On Sat, Feb 13, 2016 at 12:18:12PM -0500, Mike Marshall wrote:
>> > I added the patches, and ran a bunch of tests.
>> >
>> > Stuff works fine when left unbothered, and also
>> > when wrenches are thrown into the works.
>> >
>> > I had multiple userspace things going on at the
>> > same time, dbench, ls -R, find... kill -9 or control-C on
>> > any of them is handled well. When I killed both
>> > the client-core and its restarter, the kernel
>> > dealt with swarm of ops that had nowhere
>> > to go... the WARN_ON in service_operation
>> > was hit.
>> >
>> > Feb 12 16:19:12 be1 kernel: [ 3658.167544] orangefs: please confirm
>> > that pvfs2-client daemon is running.
>> > Feb 12 16:19:12 be1 kernel: [ 3658.167547] fs/orangefs/dir.c line 264:
>> > orangefs_readdir: orangefs_readdir_index_get() failure (-5)
>>
>> I.e. bufmap is gone.
>>
>> > Feb 12 16:19:12 be1 kernel: [ 3658.170741] ------------[ cut here ]------------
>> > Feb 12 16:19:12 be1 kernel: [ 3658.170746] WARNING: CPU: 0 PID: 1667
>> > at fs/orangefs/waitqueue.c:203 service_operation+0x4f6/0x7f0()
>>
>> ... and we are in wait_for_direct_io(), holding an r/w slot and finding
>> ourselves with bufmap already gone, despite not having freed that slot
>> yet.  Bloody wonderful - we still have bufmap refcounting buggered somewhere.
>>
>> Which tree had that been?  Could you push that tree (having checked that
>> you don't have any uncommitted changes) in some branch?
>
> OK, at the very least there's this; should be folded into "orangefs: delay
> freeing slot until cancel completes"
>
> diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
> index 41f8bb1f..1e28555 100644
> --- a/fs/orangefs/orangefs-kernel.h
> +++ b/fs/orangefs/orangefs-kernel.h
> @@ -261,6 +261,7 @@ static inline void set_op_state_purged(struct orangefs_kernel_op_s *op)
>  {
>         spin_lock(&op->lock);
>         if (unlikely(op_is_cancel(op))) {
> +               list_del(&op->list);
>                 spin_unlock(&op->lock);
>                 put_cancel(op);
>         } else {
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux