Re:

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sat, 9 May 2009 09:44:28 -0700 (PDT)

On Fri, 8 May 2009, Brandon Casey wrote:
> 
> btw, I've since done some more testing on some centos5.3 boxes we have.
> I get similar results (less ancient kernel 2.6.18).

Yes, 2.6.18 is still much too old to matter from a locking standpoint. 

When people initially worried about scalability, the issues were more 
about server side stuff and the cached cases. NFS (as a client) is 
certainly used on the server side too, but it tends to be a somewhat 
secondary worry where only specific parts really matter. So people worked 
a lot more on the core kernel, and on local high-performance filesystem 
scaling.

Only lately have we been pretty aggressive about finally really getting 
rid of the old "single big lock" (BKL) model entirely, or moving outwards 
from the core.

And while we removed the BKL from the normal NFS read/write paths long 
long ago, all the name lookup and directory handling code still had it 
until a year ago.

That, btw, is directly explained by perceived scalability issues: NFS is 
fairly often used as the backing store for a database and scaling thus 
matters there. But databases tend to keep their few big files open and use 
pread/pwrite - so pathname lookup is not nearly as significant for server 
ops as plain read/write.

(Pathname lookup is important for things like web servers etc, but they 
rely heavily on caching for that, and the cached case scales fine).

> I've also scanned through the errata announcements that RedHat has 
> released for their kernel updates.  A few of them involve NFS.  
> Possibly, whatever RedHat modified in the 5.X kernel was also backported 
> to the 4.X kernel.

That is very possibly the case. Expanding the BKL usage in some case could 
easily trigger the lock getting contention - and the way lock contention 
works, once you get a just even a small _hint_ of contention, things often 
fall off a cliff. The contention slows locking down, which in turn causes 
more CPU usage, which in turn causes _more_ contention.

So even a small amount of extra locking - or even just slowing down some 
code that was inside the lock - can have catastrophic behavioural changes 
when the lock is close to being a problem. You do not get a nice gradual 
slowdown at all - you just hit a hard wall.

I guess I should really try to set up some fileserver here at home to 
improve my test coverage. And to do better backups (or the little private 
data I have that I can't just mirror out to the world by turning it into 
an open-source project ;^)

				Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html