Re: hunting an IO hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 17, 2011 at 04:23:56PM -0500, Chris Mason wrote:
> Excerpts from Linus Torvalds's message of 2011-01-17 13:24:55 -0500:
> > On Mon, Jan 17, 2011 at 9:40 AM, Chris Mason <chris.mason@xxxxxxxxxx> wrote:
> > >> >
> > >> > I've reverted 744ed1442757767ffede5008bb13e0805085902e, and
> > >> > d8505dee1a87b8d41b9c4ee1325cd72258226fbc and the run has lasted longer
> > >> > than any runs in the past.
> > >> >
> > >>
> > >> Confirmed that reverting these patches makes the problem unreproducible
> > >> for the many_dd's + fsmark for at least an hour here.
> > >
> > > After 2+ hours I'm still running with those two commits gone.  I'm
> > > confident they are the cause of the crashes.  I also haven't triggered
> > > the cfq stalls without them.
> > 
> > Ok, so the question is how to proceed from here.
> > 
> > I can easily revert them, and since I was planning on doing -rc1
> > tonight, I probably will. But I promised Chris to delay until tomorrow
> > if he needed time to chase this down, and while it's now apparently
> > chased down, I'll certainly also be open to delaying until tomorrow if
> > somebody has a patch to fix it.
> > 
> > So right now my plan is:
> >  - I will revert those two later today and then release -rc1 in the evening
> > UNLESS
> >  - somebody posts a patch for the problem in the next few hours and
> > Chris/others are willing to give it a good test overnight (or whatever
> > people feel is "sufficient" based on how easily they can trigger the
> > issue), in which case I'd do -rc1 tomorrow (either with the reverts or
> > the patch, depending on how testing works out)
> 
> If a patch does come in, I'm happy to test it.  Mel had a test that
> triggered within 1-2 minutes, mine took 30 or so, which means I'd want a
> 2 hour run to convince myself it was really fixed.  But, I'll give Mel's
> fs_mark + dd workload a try on the buggy kernel.
> 

I spent a while seeing if there was a simple patch but it's not trivially
fixable. __activate_page() is getting called in too many different situations
to be fully sure the function is doing the right thing in all cases. I also
couldn't convince myself that the accounting was correct in all cases. I
think the idea of batching updates from mark_page_accessed() in particular
is a good idea but the patch needs a do-over.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]