On Tue, May 08, 2012 at 08:59:44AM +1000, Dave Chinner wrote: > On Sat, May 05, 2012 at 09:44:35AM +0200, Juerg Haefliger wrote: > > >> But better would be to use the device number of the relevant > > >> filesystem to filter the events. The device is 252:2, which means in > > >> kernel terms is it: > > >> > > >> dev = (major << 20) | minor > > >> = 0xfc00002 > > >> > > >> So you should be able to get just the xfs-hang events via: > > >> > > >> # trace-cmd record -e xfs\* -d 'dev == 0xfc00002' > > >> > > >> and as long as you don't host log files on /xfs-hang, it'll only > > >> record the workload running on the xfs-hang filesystem. > > > > > > Third try: https://region-a.geo-1.objects.hpcloudsvc.com:443/v1.0/AUTH_9630ead2-6194-40df-afd3-7395448d4536/xfs-hang/report-2012-04-27-180secs.tgz > > > Filtered by device, trace events go to a different filesystem. > > > > Did anybody have a chance to look at the data? > > I've had a quick look, but I need to write scripts to visualise it > (i.e. graph it) to determine if there's any pattern to the issue. And, as expected, something unexpected popped out. Judicious use of awk on the log space grant events shows an interesting pattern occuring from time to time: Transaction Wait queues Grant head Write Head Log head Log tail ----------------------------------------------------------------------------------------- INACTIVE empty empty 118 438024 118 438024 118 802 118 697 INACTIVE empty empty 119 20240 119 20240 118 802 118 697 REMOVE empty empty 118 438456 118 438456 118 802 118 697 REMOVE empty empty 118 438772 118 438772 118 802 118 697 CREATE empty empty 119 35872 119 35872 118 802 118 697 FSYNC_TS active empty 119 205428 119 205428 118 802 118 697 FSYNC_TS active empty 119 202944 119 202944 118 802 118 697 FSYNC_TS active empty 119 200664 119 200664 118 802 118 697 REMOVE empty empty 118 380316 118 380316 118 724 118 652 INACTIVE empty empty 118 552532 118 552532 118 724 118 652 FSYNC_TS empty empty 118 382140 118 382140 118 724 118 652 INACTIVE empty empty 118 565968 118 565968 118 724 118 652 REMOVE empty empty 119 25404 119 25404 118 802 118 697 INACTIVE active empty 119 25580 119 25580 118 802 118 697 Anyone notice something fishing with the log head and tail? That's right, they go *backwards* when a particular REMOVE transaction is executed. That's, well, completely unexpected, and completely breaks the assumptions made in the log space reservation code. Essentially, the log tail ismoving backwards, and the head is relative to the tail so moves with it. That is *nasty*. It means that if this occurs at just the right (wrong) time (i.e. just before a checkpoint), we can overwrite log metadata and essentially corrupt the log. If we were to crash with something like this active in the log, then log reocvery would fail and or corrupt the filesystem. It is however, very hard to trigger a crash while this conditions exists because, as you can see, it only existed for 4 transactions - about 100uS according to the traces in this case. I think this has to do with how inode allocation/unlink buffers are handled - we handle their position (via their LSN) in the AIL specially, and I think that's affecting the in memory view of the log tail and hence how much space is really available in the log. I need to do more investigation to really understand the way it fails, and write a reproducable test case, but I think I've found the smoking gun.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs