Last week, I wrote: > Ralf Baechle wrote: >> On Thu, Nov 15, 2007 at 11:26:06AM -0800, Kaz Kylheku wrote: >> >>> After backing out the nfsutils patch, the diskless node does boot. >>> >>> However, the original "exportfs -a" problem comes back! >>> >>> So this problem is not resolved simply by using the correct compat >>> routine; it's deeper. >>> >>> Sigh. >> >> Thanks for testing anyway! > > I'm continuing to dig into the problem. > > The export logic doesn't even go through nfsctl() anyway, > which is why I > originally hadn't even suspected that syscall. > > The nfsexport() function in nfsutils first tries opening > "/proc/net/rpc/nfsd.fh./channel". If that works, it uses that, via a > text-based protocol. Only if that interface doesn't exist does it fall > back on the nfsctl(NFSCTL_EXPORT, ...) interface. Basically, the export table is being mismanaged. Simply restarting NFS (service nfs restart) will cause this problem to appear. When the system is first booted up and NFS is started in runlevel 3 by the nfs init script, the exportfs command correctly populates the export table based on the /etc/exports file. However, after that, further management of the export table fails. Doing an "exportfs -a" clears it out. You can see the table in /proc/net/rpc/nfsd.export/content. Before the operation, the table has valid entries. After the operation, it simply clears out and stays empty. This is in spite of the fact that the exportfs command seems to be doing exactly what it did the first time when NFS was successfully started (i.e. it's a kernel problem; user space is doing the same thing that worked before). I verified that by turning on various additional tracing with sysctl (sunrpc.nfsd_debug), and I added some extra traces to the function that adds exports (svc_export_parse) to view the messages that are coming down the nfsd.fh/channel pipe in /proc. So the summary is that this problem appears to be some kind of corruption of the RPC cache for exports. I did see the kernel crash with an alignment exception once upon reproducing the problem, but haven't been able to repro that.