Matt, for the sake of diagnosing, can you try specifying "option self-heal off" in cluster/unify volume and try as well? thanks, avati 2008/1/14, Matt Drew <matt.drew@xxxxxxxxx>: > > Avati, > > I tried values of 2, 4, 6, 30, 60, and 120 for -e and -a with no > measurable effect. We're not using AFR yet so there's no issue there. > > On Jan 14, 2008 12:05 AM, Anand Avati <avati@xxxxxxxxxxxxx> wrote: > > Matt, > > we are currently investigating AFR + io-cache performance issues > (io-cache > > not really making full use of caching when AFR does load balanced > reads). > > You could override AFR read scheduling by specifying 'option > read-subvolume > > <first subvolume>' in AFR as a temporary workaround. That apart, I > suggest > > you mount glusterfs with large -e and -a argument values which should > > improve performance in such cases quite a bit. Do let us know if that > made > > any difference. > > > > avati > > > > 2008/1/14, Matt Drew <matt.drew@xxxxxxxxx>: > > > > > > > > > > > > I've been digging into a seemingly difficult performance issues over > > > the last few days. We're running glusterfs mainline 2.5 patch 628, > > > fuse-2.7.2 with Marian's glfs patch, kernel 2.6.23, currently one > > > server and two clients (soon to be two and four, respectively). The > > > server is a dual-core Opteron with a SATA2 disk (one, we're planning > > > on AFR redundancy), the clients are dual-core Intel machines. The > > > network transport is gigabit ethernet. The server is 32-bit and the > > > clients are 64-bit (I can rebuild the server no problem if that is the > > > issue). Throughput is good, and activity by one process seems to work > > > fine. > > > > > > Our issue is with a PHP script running on the client via the glusterfs > > > share. The script has a number of includes, and those files have a > > > few more includes. This means a lot of stats as the webserver checks > > > to make sure none of the files have changed. If we make one call to > > > the script, everything is fine - the code completes in 300ms. > > > Similarly, if you run "ls -l" on a large directory (1700 files) > > > everything appears to work fine (from local disk the code completes in > > > 100ms). > > > > > > However, if we make two concurrent calls to the PHP script, or run two > > > copies of ls -l on the large directory, everything slows down by an > > > order of magnitude. The output of the ls commands appears to stutter > > > on each copy - usually one will stop and the other will start, but > > > sometimes both will stop for a second or two. Adding a third process > > > makes it worse. The PHP script takes 2.5 or 3 seconds to complete, > > > instead of 300ms, and again more requests makes it worse - if you > > > request four operations concurrently, the finish time jumps to 7 > > > seconds. This issue occurs whether you are on a single client with > > > two processes, or if you are on two clients with one process each. > > > > > > Inserting the trace translator doesn't turn up anything unusual that I > > > can see, with the exception that it makes the processes run even > > > slower (which is expected, of course). A tcpdump of the filesystem > > > traffic shows inexplicable gaps of 100ms or more with no traffic. The > > > single process "ls -l" test does not show these gaps. > > > > > > I stripped the server and client to the bare minimum with unify. This > > > didn't seem to make a difference. I'm currently running this > > > server/client stack, also without success: > > > > > > ns > > > brick (x2) > > > posix-locks > > > io-threads(16, 64MB) > > > server (ns, brick1, brick2) > > > > > > brick1 > > > brick2 > > > unify(alu) > > > io-threads(16, 64MB) > > > io-cache(256MB) > > > > > > At various times I've tried read-ahead with no discernable difference. > > > An strace of the client process doesn't return anything interesting > > > except a lot of these: > > > > > > futex(0x12345678, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource > > > temporarily unavailable) > > > > > > These also appear during a single process test, but they are much more > > > prevalent when two processes are running. > > > > > > What am I doing wrong? :) > > > > > > > > > _______________________________________________ > > > Gluster-devel mailing list > > > Gluster-devel@xxxxxxxxxx > > > http://lists.nongnu.org/mailman/listinfo/gluster-devel > > > > > > > > > > > -- > > If I traveled to the end of the rainbow > > As Dame Fortune did intend, > > Murphy would be there to tell me > > The pot's at the other end. > -- If I traveled to the end of the rainbow As Dame Fortune did intend, Murphy would be there to tell me The pot's at the other end.