On Fri, Apr 27, 2012 at 1:07 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Fri, Apr 27, 2012 at 01:00:08AM +0200, Juerg Haefliger wrote: >> On Fri, Apr 27, 2012 at 12:44 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: >> > On Thu, Apr 26, 2012 at 02:37:50PM +0200, Juerg Haefliger wrote: >> >> On Thu, Apr 26, 2012 at 12:38 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: >> >> > On Tue, Apr 24, 2012 at 08:26:04PM +0200, Juerg Haefliger wrote: >> >> >> On Tue, Apr 24, 2012 at 2:07 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: >> >> >> > On Tue, Apr 24, 2012 at 10:55:22AM +0200, Juerg Haefliger wrote: >> >> >> >> > Alright, then I need all the usual information. I suspect an event >> >> >> >> > trace is the only way I'm going to see what is happening. I just >> >> >> >> > updated the FAQ entry, so all the necessary info for gathering a >> >> >> >> > trace should be there now. >> >> >> >> > >> >> >> >> > http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F >> >> >> >> >> >> >> >> Very good. Will do. What kernel do you want me to run? I would prefer >> >> >> >> our current production kernel (2.6.38-8-server) but I understand if >> >> >> >> you want something newer. >> >> >> > >> >> >> > If you can reproduce it on a current kernel - 3.4-rc4 if possible, if >> >> >> > not a 3.3.x stable kernel would be best. 2.6.38 is simply too old to >> >> >> > be useful for debugging these sorts of problems... >> >> >> >> >> >> OK, I reproduced a hang running 3.4-rc4. The data is here but it's a >> >> >> whopping 2GB (yes it's compressed): >> >> >> https://region-a.geo-1.objects.hpcloudsvc.com:443/v1.0/AUTH_9630ead2-6194-40df-afd3-7395448d4536/xfs-hang/report-2012-04-24.tar >> >> > >> >> > That's a bit big to be useful, and far bigger than I'm willing to >> >> > download given that I'm on the end of a wet piece of string, not a >> >> > big fat intarwebby pipe. >> >> >> >> Fair enough. >> >> >> >> >> >> > I'm assuming it is the event trace >> >> > that is causing it to blow out? If so, just the 30-60s either side of >> >> > the hang first showing up is probaby necessary, and that should cut >> >> > the size down greatly.... >> >> >> >> Can I shorten the existing trace.dat? >> > >> > No idea, but that's likely the problem - I don't want the binary >> > trace.dat file. I want the text output of the report command >> > generated from the binary trace.dat file... >> >> Well yes. I did RTFM :-) trace.dat is 15GB. > > OK, that's a lot larger than I expected for a hung filesystem.... > >> >> I stopped the trace >> >> automatically 10 secs after the the xlog_... trace showed up in syslog >> >> so effectively some 130+ secs after the hang occured. > > Can you look at the last timestamp in the report file, and trim off > anything from the start that is older than, say, 180s before that? Cut the trace down to 180 secs which brought the filesize down to 93MB: https://region-a.geo-1.objects.hpcloudsvc.com:443/v1.0/AUTH_9630ead2-6194-40df-afd3-7395448d4536/xfs-hang/report-2012-04-24-180secs.tgz ...Juerg > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs