Matt, we are currently investigating AFR + io-cache performance issues (io-cache not really making full use of caching when AFR does load balanced reads). You could override AFR read scheduling by specifying 'option read-subvolume <first subvolume>' in AFR as a temporary workaround. That apart, I suggest you mount glusterfs with large -e and -a argument values which should improve performance in such cases quite a bit. Do let us know if that made any difference. avati 2008/1/14, Matt Drew <matt.drew@xxxxxxxxx>: > > I've been digging into a seemingly difficult performance issues over > the last few days. We're running glusterfs mainline 2.5 patch 628, > fuse-2.7.2 with Marian's glfs patch, kernel 2.6.23, currently one > server and two clients (soon to be two and four, respectively). The > server is a dual-core Opteron with a SATA2 disk (one, we're planning > on AFR redundancy), the clients are dual-core Intel machines. The > network transport is gigabit ethernet. The server is 32-bit and the > clients are 64-bit (I can rebuild the server no problem if that is the > issue). Throughput is good, and activity by one process seems to work > fine. > > Our issue is with a PHP script running on the client via the glusterfs > share. The script has a number of includes, and those files have a > few more includes. This means a lot of stats as the webserver checks > to make sure none of the files have changed. If we make one call to > the script, everything is fine - the code completes in 300ms. > Similarly, if you run "ls -l" on a large directory (1700 files) > everything appears to work fine (from local disk the code completes in > 100ms). > > However, if we make two concurrent calls to the PHP script, or run two > copies of ls -l on the large directory, everything slows down by an > order of magnitude. The output of the ls commands appears to stutter > on each copy - usually one will stop and the other will start, but > sometimes both will stop for a second or two. Adding a third process > makes it worse. The PHP script takes 2.5 or 3 seconds to complete, > instead of 300ms, and again more requests makes it worse - if you > request four operations concurrently, the finish time jumps to 7 > seconds. This issue occurs whether you are on a single client with > two processes, or if you are on two clients with one process each. > > Inserting the trace translator doesn't turn up anything unusual that I > can see, with the exception that it makes the processes run even > slower (which is expected, of course). A tcpdump of the filesystem > traffic shows inexplicable gaps of 100ms or more with no traffic. The > single process "ls -l" test does not show these gaps. > > I stripped the server and client to the bare minimum with unify. This > didn't seem to make a difference. I'm currently running this > server/client stack, also without success: > > ns > brick (x2) > posix-locks > io-threads(16, 64MB) > server (ns, brick1, brick2) > > brick1 > brick2 > unify(alu) > io-threads(16, 64MB) > io-cache(256MB) > > At various times I've tried read-ahead with no discernable difference. > An strace of the client process doesn't return anything interesting > except a lot of these: > > futex(0x12345678, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource > temporarily unavailable) > > These also appear during a single process test, but they are much more > prevalent when two processes are running. > > What am I doing wrong? :) > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel > -- If I traveled to the end of the rainbow As Dame Fortune did intend, Murphy would be there to tell me The pot's at the other end.