That seems to have made it slightly worse. The concurrent "ls -l"s seemed better, but the web server execution was slightly slower. On Jan 14, 2008 6:41 PM, Anand Avati <avati@xxxxxxxxxxxxx> wrote: > Matt, > for the sake of diagnosing, can you try specifying "option self-heal off" > in cluster/unify volume and try as well? > > thanks, > > > avati > > 2008/1/14, Matt Drew < matt.drew@xxxxxxxxx>: > > Avati, > > > > I tried values of 2, 4, 6, 30, 60, and 120 for -e and -a with no > > measurable effect. We're not using AFR yet so there's no issue there. > > > > On Jan 14, 2008 12:05 AM, Anand Avati <avati@xxxxxxxxxxxxx> wrote: > > > Matt, > > > we are currently investigating AFR + io-cache performance issues > (io-cache > > > not really making full use of caching when AFR does load balanced > reads). > > > You could override AFR read scheduling by specifying 'option > read-subvolume > > > <first subvolume>' in AFR as a temporary workaround. That apart, I > suggest > > > you mount glusterfs with large -e and -a argument values which should > > > improve performance in such cases quite a bit. Do let us know if that > made > > > any difference. > > > > > > avati > > > > > > 2008/1/14, Matt Drew <matt.drew@xxxxxxxxx>: > > > > > > > > > > > > > > > > I've been digging into a seemingly difficult performance issues over > > > > the last few days. We're running glusterfs mainline 2.5 patch 628, > > > > fuse-2.7.2 with Marian's glfs patch, kernel 2.6.23, currently one > > > > server and two clients (soon to be two and four, respectively). The > > > > server is a dual-core Opteron with a SATA2 disk (one, we're planning > > > > on AFR redundancy), the clients are dual-core Intel machines. The > > > > network transport is gigabit ethernet. The server is 32-bit and the > > > > clients are 64-bit (I can rebuild the server no problem if that is the > > > > issue). Throughput is good, and activity by one process seems to work > > > > fine. > > > > > > > > Our issue is with a PHP script running on the client via the glusterfs > > > > share. The script has a number of includes, and those files have a > > > > few more includes. This means a lot of stats as the webserver checks > > > > to make sure none of the files have changed. If we make one call to > > > > the script, everything is fine - the code completes in 300ms. > > > > Similarly, if you run "ls -l" on a large directory (1700 files) > > > > everything appears to work fine (from local disk the code completes in > > > > 100ms). > > > > > > > > However, if we make two concurrent calls to the PHP script, or run two > > > > copies of ls -l on the large directory, everything slows down by an > > > > order of magnitude. The output of the ls commands appears to stutter > > > > on each copy - usually one will stop and the other will start, but > > > > sometimes both will stop for a second or two. Adding a third process > > > > makes it worse. The PHP script takes 2.5 or 3 seconds to complete, > > > > instead of 300ms, and again more requests makes it worse - if you > > > > request four operations concurrently, the finish time jumps to 7 > > > > seconds. This issue occurs whether you are on a single client with > > > > two processes, or if you are on two clients with one process each. > > > > > > > > Inserting the trace translator doesn't turn up anything unusual that I > > > > can see, with the exception that it makes the processes run even > > > > slower (which is expected, of course). A tcpdump of the filesystem > > > > traffic shows inexplicable gaps of 100ms or more with no traffic. The > > > > single process "ls -l" test does not show these gaps. > > > > > > > > I stripped the server and client to the bare minimum with unify. This > > > > didn't seem to make a difference. I'm currently running this > > > > server/client stack, also without success: > > > > > > > > ns > > > > brick (x2) > > > > posix-locks > > > > io-threads(16, 64MB) > > > > server (ns, brick1, brick2) > > > > > > > > brick1 > > > > brick2 > > > > unify(alu) > > > > io-threads(16, 64MB) > > > > io-cache(256MB) > > > > > > > > At various times I've tried read-ahead with no discernable difference. > > > > An strace of the client process doesn't return anything interesting > > > > except a lot of these: > > > > > > > > futex(0x12345678, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource > > > > temporarily unavailable) > > > > > > > > These also appear during a single process test, but they are much more > > > > prevalent when two processes are running. > > > > > > > > What am I doing wrong? :) > > > > > > > > > > > > _______________________________________________ > > > > Gluster-devel mailing list > > > > Gluster-devel@xxxxxxxxxx > > > > http://lists.nongnu.org/mailman/listinfo/gluster-devel > > > > > > > > > > > > > > > > -- > > > If I traveled to the end of the rainbow > > > As Dame Fortune did intend, > > > Murphy would be there to tell me > > > The pot's at the other end. > > > > > > -- > > > If I traveled to the end of the rainbow > As Dame Fortune did intend, > Murphy would be there to tell me > The pot's at the other end.