Re: [PATCH 1/4] nfs-utils: introduce new statd implementation (1st part)

Trond Myklebust <trond.myklebust@xxxxxxxxxx> · Wed, 09 Sep 2009 14:39:59 -0400

On Wed, 2009-09-09 at 14:29 -0400, Jeff Layton wrote:
> On Wed, 05 Aug 2009 19:30:04 -0400
> Trond Myklebust <trond.myklebust@xxxxxxxxxx> wrote:
> 
> > On Wed, 2009-08-05 at 18:24 -0400, Chuck Lever wrote:
> > > On Aug 5, 2009, at 5:22 PM, Trond Myklebust wrote:
> > > > On Wed, 2009-08-05 at 14:26 -0400, Chuck Lever wrote:
> > > >> sqlite3 doesn't do anything special under the covers.  It uses only
> > > >> POSIX file access and locking calls, as far as I know.  So I think
> > > >> hosting /var on most well-behaved clustering file systems won't have
> > > >> any problem with this arrangement.
> > > >
> > > > So we're basically introducing a dependency on a completely new  
> > > > library
> > > > that will have to be added to boot partitions/nfsroot/etc, and we have
> > > > no real reason for doing it other than because we want to move from
> > > > using sync() to fsync()?
> > > >
> > > > Sounds like a NACK to me...
> > > 
> > > Which library are you talking about, libsqlite3 or libtirpc?  Because  
> > > NEITHER of those is in /lib.
> > 
> > libsqlite is the problem. Unlike libtirpc, it's utility has yet to be
> > established.
> >
> 
> Sorry to revive this so late, but I think we need to come to some
> sort of resolution here. The only missing piece for client side IPv6
> support is statd...
> 
> I'm not sure I understand the objection to using libsqlite3 here. We
> certainly could roll our own routines to handle data storage, but why
> would we want to do so? sqlite3 is quite good at what it does. Why
> wouldn't we want to use it?

Backwards compatibility is one major reason. statd already exists, and
is in use out there. I shouldn't be forced to reboot all my clients when
I upgrade the nfs-utils package on my server.

Simplicity is another reason. WTF do we need a full SQL database, when
all we want to do is store 2 pieces of data (a hostname and a cookie)?
It isn't as if this has been a major problem for us previously.

> > > In any event, it's not just sync(2) that is a problem.  sync(2) by  
> > > itself is a boot performance problem, but it's the combination of  
> > > rename and sync that is known to be especially unreliable during  
> > > system crashes.  Statd, being a crash monitor, shouldn't depend on  
> > > rename/sync to maintain persistent data in the face of system  
> > > instability.  I'd call that a real reason to use something more robust.
> > 
> > What are you talking about? Is this about the truncate + rename issue
> > leaving empty files upon a crash?
> > That issue is solved trivially by doing an fsync() before you rename the
> > file. That entire discussion was about whether or not existing
> > applications should be _required_ to do this kind of POSIX pedantry,
> > when previously they could get away without it.
> > 
> > IOW: that issue alone does not justify replacing the current simple file
> > based scheme.
> >
> 
> There are other reasons, not to use the simple file-based scheme too...
> 
> Internationalized domain names will be easier to deal with via sqlite3,
> for instance.

Please explain...

> Certainly we could code this up ourselves, but what's the benefit to
> doing that when we have a perfectly good data storage engine available?

Why change something that works???? Rewriting from scratch is _NOT_ the
Linux way, and has usually bitten us hard when we've done it.

The 2.6.19 rewrite of the kernel mount code springs to mind...

Trond

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html