On Tue, Apr 24, 2012 at 09:27:59AM +1000, Dave Chinner wrote: > On Mon, Apr 23, 2012 at 04:24:41PM -0400, Josef 'Jeff' Sipek wrote: > > On Sat, Apr 21, 2012 at 10:29:32AM +1000, Dave Chinner wrote: > > ... > > > but also given the length of the incident, some other data is definitely > > > needed: > > > - a 30s event trace - it'll compress pretty well > > > (trace-cmd record -e xfs* sleep 30; trace-cmd report > output.txt) > > > > http://31bits.net/download/output-1335211829.txt.bz2 > > Ok, that's instructive. Inode RMW cycles in the xfsaild: ... > Every buffer read is followed by a queuing for write. It is also > showing that it is typically taking 10-25ms per inode read IO, which > is exactly what I'd expect for your given storage. There are 2500 of > these over the 30s period, which translates to about one every 12ms > across the 30s sample. > > So, yes, your hangs are definitely due to inode buffer RMW cycles > when trying to flush dirty inodes from the cache. I have a few > ideas on how to fix this - but I'm not sure whether a current TOT > solution will be easily back-portable. If it is too much effort to backport, we should be able to move the box to a 3.3 stable kernel (assuming no driver problems). > The simplest solution is a readahead based solution - AIL pushing is > async, and will cycle back to inodes that it failed to flush the first > time past, so triggering readahead on the first pass might work just fine. ... This makes sense. With a large enough log, could you not end up evicting the readahead inodes by the time you get back to them? > That way the xfsaild will make a pass across the AIL doing > readahead and doesn't block on RMW cycles. Effectively we get async > RMW cycles occurring, and the latency of a single cycle will no > longer be the performance limiting factor. I'll start to prototype > something to address this - it isn't a new idea, and I've seen it > done before, so i should be able to get something working. Cool. Let me know when you have something we can try. I don't quite know what it is that's causing this giant backlog of inode modifications - I suspect it's the rsync that's pushing it over. But regardless, I'm interested in testing the fix. Thanks! Jeff. -- Fact: 30.3% of all statistics are generated randomly. _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs