On Mon, Jan 17, 2011 at 10:02:47AM -0500, Chris Mason wrote: > Excerpts from Chris Mason's message of 2011-01-17 09:07:40 -0500: > > [ various crashes under load with current git ] > > > > > I did have CONFIG_COMPACTION off for my latest reproduce. The last two > > have been corruption on the page->lru lists, maybe that'll help narrow > > our bisect pool down. > > I've reverted 744ed1442757767ffede5008bb13e0805085902e, and > d8505dee1a87b8d41b9c4ee1325cd72258226fbc and the run has lasted longer > than any runs in the past. > > I'll give this a few hours but they seem the most related to my various > crashes so far. I went through the new batched activation code. Shaohua, can you explain to me why the following sequence is not possible? 1. CPU A and B schedule activation of a page (PG_lru && !PG_active) 2. CPU A flushes the page to the active list (PG_lru && PG_active) 3. CPU A isolates the page for scanning/migration and puts it on private list (!PG_lru && PG_active) 4. CPU B flushes the page to the active list (!PG_lru && PG_active), the deferred activation code now assumes putback mode and adds the page to the active list, thus corrupting the link to the private list of CPU A 5. CPU A does list_del() from the private list (like unmap_and_move() does) and trips up on the corruption Hannes -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>