Re: reboot recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/09/2010 03:53 PM, J. Bruce Fields wrote:
On Tue, Mar 09, 2010 at 12:39:35PM -0500, Chuck Lever wrote:
Thanks, this is very clear.

On 03/08/2010 08:46 PM, J. Bruce Fields wrote:
The Linux server's reboot recovery code has long-standing architectural
problems, fails to adhere to the specifications in some cases, and does
not yet handle NFSv4.1 reboot recovery.  An overhaul has been a
long-standing todo.

This is my attempt to state the problem and a rough solution.

Requirements
^^^^^^^^^^^^

Requirements, as compared to current code:

	- Correctly implements the algorithm described in section 8.6.3
	  of rfc 3530, and eliminates known race conditions on recovery.
	- Does not attempt to manage files and directories directly from
	  inside the kernel.
	- Supports RECLAIM_COMPLETE.

Requirements, in more detail:

A "server instance" is the lifetime from start to shutdown of a server;
a reboot ends one server instance and starts another.

It would be better if you architected this not in terms of a server
reboot, but in terms of "service nfs stop" and "service nfs start".

Good point; fixed in my local copy.

(Though that may work for v4-only servers, since I think v2/v3 may still
have problems with restarts that don't restart everything (including the
client).)

Well, eventually I hope to address some of those issues. But, no use tying our NFSv4 stuff to the problems of the v2/v3 implementation.

Draft design
^^^^^^^^^^^^

We will modify rpc.statd to handle to manage state in userspace.

Please don't.  statd is ancient krufty code that is already barely able
to do what it needs to do.

statd is single-threaded.  It makes dozens of blocking DNS calls to
handle NSM protocol requests.  It makes NLM downcalls on the same thread
that handles everything else.  Unless an effort was undertaken to make
statd multithreaded, this extra work could cause signficant latency for
handling upcalls.

Hm, OK.  I guess I don't want to make this project dependent on
rewriting statd.

So, other possibilities:
	- Modify one of the other existing userland daemons.
	- Make a separate daemon just for this.
	- ditch the daemon entirely and depend mainly on hotplug-like
	  invocations of a userland program that exist after it handles
	  a single call.

Previous prototype code from CITI will be considered as a starting
point.

Kernel<->user communication will use four files in the "nfsd"
filesystem.  All of them will use the encoding used for rpc cache
upcalls and downcalls, which consist of whitespace-separated fields
escaped as necessary to allow binary data.

In general, we don't want to mix RPC listeners and upcall file
descriptors.  mountd has to access the cache file descriptors to satisfy
MNT requests, so there is a reason to do it in that case.  Here there is
no purpose to mix these two.  It only adds needless implementation
complexity and unnecessary security exposures.

Yesterday, it was suggested that we split mountd into a piece that
handled upcalls and a piece that handled remote MNT requests via RPC.
Weren't you the one who argued in favor of getting rid of daemons called
"rpc.foo" for NFSv4-only operation? :-)

Yeah.  So I guess a subcase of the second option above would be to name
the new daemon "nfsd-userland-helper" (or something as generic) and
eventually make it handle export upcalls too.  I don't know.

I wasn't thinking of a single daemon for this stuff, necessarily, but rather a single framework that can be easily fit to whatever task is needed. Just alter a few constants, specify the arguments and their types, add boiling water, type 'make' and fluff with fork.

We've already got referral/DNS, idmapper, gss, and mountd upcalls, and they all seem to do it differently from each other.

--
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux