Re: [PATCH/RFC] NFS: state manager thread must start running.

NeilBrown <neilb@xxxxxxx> · Mon, 21 Jul 2014 13:35:51 +1000

On Tue, 15 Jul 2014 10:51:11 -0400 Tejun Heo <tj@xxxxxxxxxx> wrote:

> Hello, Neil.
> 
> On Tue, Jul 15, 2014 at 06:13:17PM +1000, NeilBrown wrote:
> > Could do that (or per-client) but it doesn't really buy us anything does it?
> 
> It does buy some.
> 
> 1. The kworker threads are more likely to be cache-hot than explicit
>    kthreads.
> 
> 2. Workqueue is a lot eaiser to get right in terms of synchronization
>    and freezing.
> 
> 3. Workqueue mandates well-defined boundaries between separate
>    execution instances which often makes it a lot easier to implement
>    and update kernel-wide features such as like freezer and runtime
>    kernel patching.
> 
> > The state manager assumes it is single threads, so it would need to be
> > a single-threaded workqueue with always at least one thread running.
> > That is much the same as a kthread.
> > 
> > And then there is that fact that the current code explicitly enabled SIGKILL
> > and maybe that is important.
> 
> If SIGKILL handling is mandatory (really?), kthread_worker can be used
> for #2 and #3.
> 
> Thanks.
> 

(kthread_worker doesn't seem to be very well documented, but I think I see
what it does).

The only reason I can think for that SIGKILL might be important is that when
a server is not responding, a process that it trying to talk to it will only
give up if it gets a fatal signal.  So if state recovery starts for a server
that cannot be contacted, the thread doing the recovery will block until the
server comes back or until it received SIGKILL.

I cannot see anything that would generate such a SIGKILL except the broadcast
SIGKILL at shutdown.
So maybe the purpose of 
	allow_signal(SIGKILL);
is to ensure that when the machine is shutdown, the -manager thread actually
dies.
But I'm not confident of this explanation.  If this were the issue I would
expect nfs_umount_begin to be sending SIGKILL too. But it just does
rpc_killall_tasks- maybe that is enough.  If so, is SIGKILL really needed.

Trond: can you provide some wisdom?  Is SIGKILL important for the manager
threads?
If so, would you prefer a thread that uses kthread_worker, or one that works
more like the current code?
If not, would you be happy with a fully workqueue based solution?

Thanks,
NeilBrown

Attachment:
signature.asc

Description: PGP signature