The CPU usage issue turned out to be a wrong - I found that someone else was testing writes as I was working reads and the webserver. There were no lk() calls in either the debug logs or the trace file (probably due to locks being disabled while I was tracing). I've now removed the locks and io-threads translators on the server. Current testing environment: brick-ns brick1 brick2 server brick-ns brick1 brick2 unify I've discovered that the server on glusterfs is performing similarly to an older PPC OS X machine we use for testing. I'm now searching for similarities to see if I can identify something in common that could cause the slowdown. On Jan 17, 2008 9:56 PM, Anand Avati <avati@xxxxxxxxxxxxx> wrote: > Matt, > replies inline - > > > > > > I've removed all the performance translators on the client. The > > server has io-threads just below the server export layer, and I > > replaced the lock translators with debug translators. So my stack now > > looks like this: > > > > bricks > > traces > > io-threads > > server > > > > client bricks > > unify (self-heal turned back on) > > > > I had checked the server, but not thoroughly. When our scripts are > > called, the glusterfsd CPU usage shoots up - 50-60% of one cpu > > (Opteron 1210 dual-core). This is much higher than the client (which > > is usually 4-10% CPU during the same period). > > Is this observed after load the trace translator? trace can eat a lot of cpu > cycles as it sets the overall logging level to DEBUG. > > > > The peak coincides with > > the slow execution, and then immediately drops off - even though there > > is quite a bit of I/O going still. I see a similar spike on writes, > > sustained for large files. the debug trace (just above the server > > brick) contains enormous numbers of inode creations, activations, > > passivations, and destroys., but nothing that stands out as broken. > > Still, the high CPU usage seems odd, especially on reads - doesn't > > seem like there should be that much activity. > > can you grep through the log file and see any references of lk() calls > happening? if so, can you retry your runs by removing posix-locks > translator? (for the sake of diagnosis) > > thanks, > avati >