Re: kernel hacker's pub night

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Thu, 26 Jun 2008 17:22:14 -0400

On Thu, Jun 26, 2008 at 05:05:44PM -0400, Chuck Lever wrote:
> On Thu, Jun 26, 2008 at 1:55 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> > On Thu, Jun 26, 2008 at 01:42:58PM -0400, Chuck Lever wrote:
> >> On Jun 26, 2008, at 3:19 AM, Krishna Kumar2 wrote:
> >>> Benny Halevy <bhalevy@xxxxxxxxxxx> wrote on 06/23/2008 06:10:40 PM:
> >>>
> >>>> Apparently the file is cached.  You needed to restart nfs
> >>>> and remount the file system to make sure it isn't before reading it.
> >>>> Or, you can create a file larger than your host's cache size so
> >>>> when you write (or read) it sequentially, its tail evicts its head
> >>>> out of the cache.  This is a less reliable method, yet creating a
> >>>> file about 25% larger than the host's memory size should work for
> >>>> you.
> >>>
> >>> I did a umount of all filesystems and restart NFS before testing. Here
> >>> is the result:
> >>>
> >>> Local:
> >>>      Read:  69.5 MB/s
> >>>      Write: 70.0 MB/s
> >>> NFS of same FS mounted loopback on same system:
> >>>      Read:  29.5 MB/s  (57% drop)
> >>>      Write: 27.5 MB/s  (60% drop)
> >>>
> >>> The drops seems exceedingly high. How can I figure out the source of
> >>> the
> >>> problem? Even if it is as general as to be able to state: "Problem is
> >>> in
> >>> the NFS client code" or "Problem is in the NFS server code", or
> >>> "Problem
> >>> can be mitigated by tuning" :-)
> >>
> >> It's hard to say what might be the problem just by looking at
> >> performance results.
> >>
> >> You can look at client-side NFS and RPC performance metrics using some
> >> prototype Python tools that were just added to nfs-utils.  The scripts
> >> themselves can be downloaded from:
> >>
> >>    http://oss.oracle.com/~cel/Linux-2.6/2.6.25
> >>
> >> but unfortunately they are not fully documented yet so you will have to
> >> approach them with an open mind and a sense of experimentation.
> >>
> >> You can also capture network traces on your loopback interface to see if
> >> there is, for example, unexpected congestion or latency, or if there are
> >> other problems.
> >>
> >> But for loopback, the problem is often that the client and server are
> >> sharing the same physical memory for caching data.  Analyzing your test
> >> system's physical memory utilization might be revealing.
> >
> > If he's just doing a single large read or write with cold caches (sounds
> > like that's probably the case), then memory probably doesn't matter
> > much, does it?
> 
> I expect it might.
> 
> The client and server would contend for available physical memory as
> the file was first read in from the physical file system by the
> server, and then a second copy was cached by the client.
> 
> A file as small as half the available physical memory on his system
> could trigger this behavior.

So, forgive me for being naive about this stuff, but I would've thought
that the cached pages (which have been read once and then never touched
again) would just be discarded, and life would continue.  Otherwise how
would the kernel be able to get acceptable streaming read performance in
any situation? 

This doesn't sound fundamentally different, e.g., from doing streaming
reads two files on different filesystems at once.

> On older 2.6 kernels (.18 or so), both the server's physical file
> system and the client would trigger bdi congestion throttling.

How does that work?

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html