Re: CPU stall, eventual host hang with BTRFS + NFS under heavy load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2021-10-15 at 08:03 +0000, Trond Myklebust wrote:
> On Fri, 2021-10-15 at 09:51 +1100, NeilBrown wrote:
> > On Fri, 15 Oct 2021, Trond Myklebust wrote:
> > > On Tue, 2021-10-12 at 08:57 +1100, NeilBrown wrote:
> > > > On Tue, 12 Oct 2021, Chuck Lever III wrote:
> > > > > 
> > > > > Scott seems well positioned to identify a reproducer. Maybe
> > > > > we
> > > > > can give him some likely candidates for possible bugs to
> > > > > explore
> > > > > first.
> > > > 
> > > > Has this patch been tried?
> > > > 
> > > > NeilBrown
> > > > 
> > > > 
> > > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > > > index c045f63d11fa..308f5961cb78 100644
> > > > --- a/net/sunrpc/sched.c
> > > > +++ b/net/sunrpc/sched.c
> > > > @@ -814,6 +814,7 @@ rpc_reset_task_statistics(struct rpc_task
> > > > *task)
> > > >  {
> > > >         task->tk_timeouts = 0;
> > > >         task->tk_flags &= ~(RPC_CALL_MAJORSEEN|RPC_TASK_SENT);
> > > > +       clear_bit(RPC_TASK_SIGNALLED, &task->tk_runstate);
> > > >         rpc_init_task_st
> > > 
> > > We shouldn't automatically "unsignal" a task once it has been
> > > told
> > > to
> > > die. The correct thing to do here should rather be to change
> > > rpc_restart_call() to exit early if the task was signalled.
> > > 
> > 
> > Maybe.  It depends on exactly what the signal meant
> > (rpc_killall_tasks()
> > is a bit different from getting a SIGKILL), and exactly what the
> > task
> > is
> > trying to achieve.
> > 
> > Before Commit ae67bd3821bb ("SUNRPC: Fix up task signalling")
> > that is exactly what we did.
> > If we want to change the behaviour of a task responding to
> > rpc_killall_tasks(), we should clearly justify it in a patch doing
> > exactly that.
> > 
> 
> The intention behind rpc_killall_tasks() never changed, which is why
> it

("it" being the error ERESTARTSYS)

> is listed in nfs_error_is_fatal(). I'm not aware of any case where we
> deliberately override in order to restart the RPC call on an
> ERESTARTSYS error.
> 
> 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux