Hi! > >> Oh and did I ask in this thread for /proc/zoneinfo yet? :) > > > > Using that same kernel[1], got again into a race, gathered a few more data. > > > > This time, I had 1x "urpmq" process [2] hung at 100% CPU , when "kwin" got > > apparently blocked (100% CPU, too) trying to resize a GUI window. I suppose > > the resizing operation would mean heavy memory alloc/free. > > > > The rest of the system was responsive, I could easily get a console, login, > > gather the files.. Then, I have *killed* -9 the "urpmq" process, which solved > > the race and my system is still alive! "kwin" is still running, returned to > > regular CPU load. > > > > Attached is traces from SysRq+l (pressed a few times, wanted to "snapshot" the > > stack) and /proc/zoneinfo + /proc/vmstat > > > > Bisection is not yet meaningful, IMHO, because I cannot be sure that "good" > > points are really free from this issue. I'd estimate that each test would take > > +3days, unless I really find a deterministic way to reproduce the issue . > > Hi, > > I think I finally found the cause by staring into the code... CCing > people from all 4 separate threads I know about this issue. > The problem with finding the cause was that the first report I got from > Markus was about isolate_freepages_block() overhead, and later Norbert > reported that reverting a patch for isolate_freepages* helped. But the > problem seems to be that although the loop in isolate_migratepages exits > because the scanners almost meet (they are within same pageblock), they > don't truly meet, therefore compact_finished() decides to continue, but > isolate_migratepages() exits immediately... boom! But indeed e14c720efdd7 > made this situation possible, as free scaner pfn can now point to a > middle of pageblock. Ok, it seems it happened second time now, again shortly after resume. I guess I should apply your patch after all. (Or... instead it should go to Linus ASAP -- it fixes known problem that is affected people, and we want it in soon in case it is not complete fix.) Dmesg is in the attachment, perhaps it helps. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Attachment:
delme.gz
Description: application/gzip