I've been digging into a seemingly difficult performance issues over the last few days. We're running glusterfs mainline 2.5 patch 628, fuse-2.7.2 with Marian's glfs patch, kernel 2.6.23, currently one server and two clients (soon to be two and four, respectively). The server is a dual-core Opteron with a SATA2 disk (one, we're planning on AFR redundancy), the clients are dual-core Intel machines. The network transport is gigabit ethernet. The server is 32-bit and the clients are 64-bit (I can rebuild the server no problem if that is the issue). Throughput is good, and activity by one process seems to work fine. Our issue is with a PHP script running on the client via the glusterfs share. The script has a number of includes, and those files have a few more includes. This means a lot of stats as the webserver checks to make sure none of the files have changed. If we make one call to the script, everything is fine - the code completes in 300ms. Similarly, if you run "ls -l" on a large directory (1700 files) everything appears to work fine (from local disk the code completes in 100ms). However, if we make two concurrent calls to the PHP script, or run two copies of ls -l on the large directory, everything slows down by an order of magnitude. The output of the ls commands appears to stutter on each copy - usually one will stop and the other will start, but sometimes both will stop for a second or two. Adding a third process makes it worse. The PHP script takes 2.5 or 3 seconds to complete, instead of 300ms, and again more requests makes it worse - if you request four operations concurrently, the finish time jumps to 7 seconds. This issue occurs whether you are on a single client with two processes, or if you are on two clients with one process each. Inserting the trace translator doesn't turn up anything unusual that I can see, with the exception that it makes the processes run even slower (which is expected, of course). A tcpdump of the filesystem traffic shows inexplicable gaps of 100ms or more with no traffic. The single process "ls -l" test does not show these gaps. I stripped the server and client to the bare minimum with unify. This didn't seem to make a difference. I'm currently running this server/client stack, also without success: ns brick (x2) posix-locks io-threads(16, 64MB) server (ns, brick1, brick2) brick1 brick2 unify(alu) io-threads(16, 64MB) io-cache(256MB) At various times I've tried read-ahead with no discernable difference. An strace of the client process doesn't return anything interesting except a lot of these: futex(0x12345678, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) These also appear during a single process test, but they are much more prevalent when two processes are running. What am I doing wrong? :)