Hey Dave, Thanks for the suggestions. On Tue, Jul 27, 2010 at 09:22:24AM +1000, Dave Chinner wrote: > On Mon, Jul 26, 2010 at 05:09:09PM -0500, Shawn Bohrer wrote: > > On Fri, Jul 23, 2010 at 09:28:46AM +1000, Dave Chinner wrote: > > > On Thu, Jul 22, 2010 at 09:47:06AM -0500, Shawn Bohrer wrote: > > > > Hello, > > > > > > > > We have an application that receives data over the network, and then > > > > writes data out to a shared memory mapped file on xfs. Previously > > > > everything worked great when we were running a CentOS 5.3 kernel > > > > (2.6.18-128.7.1.el5), but when we upgraded to the newer CentOS 5.4 or > > > > 5.5 kernels we started to see occasional spikes (up to 1 second) in > > > > the time it takes to write the data to the memory mapped file. I > > > > noticed that between the CentOS 5.3 and 5.4 kernels the upstream xfs > > > > changes were backported. Testing on 2.6.32.16 shows the same > > > > occasional latency spikes. > > > > > > > > After doing some debugging on 2.6.32.16 I have found the cause of the > > > > latency spikes. The spikes occur every time pdflush runs to flush the > > > > dirty pages. Note our app only writes data at about 1.5 MB/s and we > > > > are well below both the dirty_background_ratio and the dirty_ratio so > > > > these flushes are only of pages that are dirty_expire_centisecs old. > > > > Capturing one of the spikes with ftrace shows: > > > > > > > > ... > > > > flush-8:0-601 [011] 1291592.127590: funcgraph_entry: | xfs_ilock() { > > > > flush-8:0-601 [011] 1291592.127590: funcgraph_entry: | down_write() { > > > > flush-8:0-601 [011] 1291592.127590: funcgraph_entry: 0.213 us | _cond_resched(); > > > > flush-8:0-601 [011] 1291592.127591: funcgraph_exit: 0.657 us | } > > > > flush-8:0-601 [011] 1291592.127591: funcgraph_exit: 1.174 us | } > > > > flush-8:0-601 [011] 1291592.127764: sched_switch: task flush-8:0:601 [120] (D) ==> swapper:0 [120] > > > > ... > > > > > > what is the lead-in context to this hunk? i.e. how does xfs_ilock() > > > get called? That will tell me if it is blocking on the iolock or the > > > ilock... > > > > I don't have the lead-in context here. I'm using the function_graph > > tracer and with stalls up to a second in length it easily overflowed > > the ftrace ring buffer. Once I found I my process was blocking on the > > xfs_ilock(), I limited my traces to include xfs_ilock, xfs_iunlock, > > handle_mm_fault, and the scheduler events. If you are still > > interested in seeing some more detail here I can try to collect some > > more captures. > > I'm not sure isthe tracepoint buffer is the same as the ftrace > buffer, but if it is you can increase the size by doing this sort of > thing: > > # echo 32768 > /sys/kernel/debug/tracing/buffer_size_kb > > I often do that to capture long XFS traces during testing.... > > [snip] > > > > Hmmm - perhaps the issue is that the transaction commit has not been > > > able to get a iclog buffer to write the transaction into and so is > > > sleeping waiting for an iclog IO to complete. That would be woken by > > > the xfslogd, and then the transaction would write into the > > > now-available iclog and unlock the inode. > > > > > > So, I'd say that in this case your application is waiting for log IO > > > completion to occur. Nothing you can do to avoid that, but you can > > > minimise the impact - increase the log buffer size via the mount > > > option logbsize (i.e. -o logbsize=262144) to reduce the frequency of > > > iclog buffer IOs and make it less likely the system will run stall > > > having run out of iclog space. > > > > > > The other possibility is that the log IO is being slowed by barriers > > > being issued with the log writes. If you've got non-volatile caches > > > on your storage, you can turn off barriers and that should help > > > reduce log IO latency significantly. > > > > I've tried mounting with "noatime,nobarrier,logbufs=8,logbsize=262144" > > but I don't notice much difference and still see the stalls. > > Ok. One other possibility came to mind - the transaction commit is > blocking while locking the superblock buffer to apply free block > count changes. Do you have the "lazy-count" option set (xfs_info > <mtpt> will tell you). If it isn't set, then transaction commit > latency can be quite high if the sb buffer is being written to disk > when the transaction tries to lock it. If it is not set, you can use > xfs_admin to set that bit (mkfs has defaulted to lazy-count=1 for > some time now). Setting lazy-count=1 looks like it has helped quite a bit to reduce the frequency that I was seeing the spikes. I am still seeing spikes of up to a second, but I haven't had time to capture any more traces to verify if they are the same issue. Overall though this may be good enough. Thanks, Shawn _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs