Re: [PATCH 1/4] nfs-utils: introduce new statd implementation (1st part)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sep 9, 2009, at 2:39 PM, Trond Myklebust wrote:
On Wed, 2009-09-09 at 14:29 -0400, Jeff Layton wrote:
On Wed, 05 Aug 2009 19:30:04 -0400
Trond Myklebust <trond.myklebust@xxxxxxxxxx> wrote:

On Wed, 2009-08-05 at 18:24 -0400, Chuck Lever wrote:
On Aug 5, 2009, at 5:22 PM, Trond Myklebust wrote:
On Wed, 2009-08-05 at 14:26 -0400, Chuck Lever wrote:
sqlite3 doesn't do anything special under the covers. It uses only POSIX file access and locking calls, as far as I know. So I think hosting /var on most well-behaved clustering file systems won't have
any problem with this arrangement.

So we're basically introducing a dependency on a completely new
library
that will have to be added to boot partitions/nfsroot/etc, and we have no real reason for doing it other than because we want to move from
using sync() to fsync()?

Sounds like a NACK to me...

Which library are you talking about, libsqlite3 or libtirpc? Because
NEITHER of those is in /lib.

libsqlite is the problem. Unlike libtirpc, it's utility has yet to be
established.


Sorry to revive this so late, but I think we need to come to some
sort of resolution here. The only missing piece for client side IPv6
support is statd...

I'm not sure I understand the objection to using libsqlite3 here. We
certainly could roll our own routines to handle data storage, but why
would we want to do so? sqlite3 is quite good at what it does. Why
wouldn't we want to use it?

Backwards compatibility is one major reason. statd already exists, and
is in use out there. I shouldn't be forced to reboot all my clients when
I upgrade the nfs-utils package on my server.

The old statd still exists in nfs-utils. The new statd is an entirely separate component. Distributions can continue to use the old statd as long as they want. This is a red herring.

Simplicity is another reason. WTF do we need a full SQL database, when
all we want to do is store 2 pieces of data (a hostname and a cookie)?
It isn't as if this has been a major problem for us previously.

Because we are not storing just a hostname and a cookie. We are storing several different data items for each host, and we need to search over the records, and provide uniqueness constraints, and handle data conversion (for binary data like the cookie, for string data like the hostname, and for integers, like the prog/vers/proc tuple). We need to store them durably on persistent storage to have some protection against crashes. These are all things that an embedded database can do well, and that we therefore don't have to code ourselves.

Simplicity is in the eye of the beholder. I can easily counterargue that it's simpler to rely on a library than to duplicate the functionality in statd. If we think it is appropriate to use glibc for managing a memory heap rather than open coding it, and libtirpc for handling RPC calls rather than open coding it, then why shouldn't we use another library for managing our host records on disk, rather than open coding our data conversion and record searching logic?

Again, this is a red herring.

In any event, it's not just sync(2) that is a problem.  sync(2) by
itself is a boot performance problem, but it's the combination of
rename and sync that is known to be especially unreliable during
system crashes.  Statd, being a crash monitor, shouldn't depend on
rename/sync to maintain persistent data in the face of system
instability. I'd call that a real reason to use something more robust.

What are you talking about? Is this about the truncate + rename issue
leaving empty files upon a crash?
That issue is solved trivially by doing an fsync() before you rename the
file. That entire discussion was about whether or not existing
applications should be _required_ to do this kind of POSIX pedantry,
when previously they could get away without it.

IOW: that issue alone does not justify replacing the current simple file
based scheme.


There are other reasons, not to use the simple file-based scheme too...

Internationalized domain names will be easier to deal with via sqlite3,
for instance.

Please explain...

IPv6 is used in Asia, where they almost certainly need to use non- ASCII characters in their hostnames. Internationalized domain names are stored in double-wide character sets. To provide reliable support for IDNs in statd, we will have to guarantee somehow that we can store an IDN as a file name (if we want to stay with the current scheme), no matter what file system is used for /var.

What's more, multi-homed host support will need to store multiple records for the same hostname. The mon_name is the same, but my_name is different, for each of these records. So we could do that by adding more than one line in each hostname file, but it's also a simple matter to set this up in SQL.

When we want to have statd remember things like multiple addresses for the same hostname, or whether the remote is a client or server, we will need to make more adjustments to the files.

As we get more and more new requirements, why lock ourselves into the current on-disk format? Using statd means we can store new fields and new records without any backwards-compatibility issues. It's all handled by the database code. So, we can think about the high level problem of getting statd to behave correctly rather than worry about the details of exactly how we are going to get the next data item stored in our current files in a backward compatible way.

Certainly we could code this up ourselves, but what's the benefit to
doing that when we have a perfectly good data storage engine available?

Why change something that works???? Rewriting from scratch is _NOT_ the
Linux way, and has usually bitten us hard when we've done it.

Because we are adding a bunch of new feature requirements. Internationalized domain names, multi-homed host support, IPv6 and TI- RPC, fast boot times, keeping better track of remote host addresses, keeping track of which remotes are clients and which are servers, and support for sending notifications via TCP all require significant modifications to this code base.

At some point you have to look at the code you have, and decide it's simply not going to be adequate, going forward.

The 2.6.19 rewrite of the kernel mount code springs to mind...

One can just as easily argue that we've been bitten hard precisely because we've let things rot, or because we have inadequate testing for these components.

Another red herring, and especially annoying because you've known I was rewriting statd for months. Only now, when I'm done, do you say "rewriting is not the Linux way."

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux