On Tue, Jan 11, 2011 at 11:14:20AM +0900, KAMEZAWA Hiroyuki wrote: > On Mon, 10 Jan 2011 15:56:37 -0800 (PST) > Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > > > On Mon, 10 Jan 2011, Mel Gorman wrote: > > > > > > ==== CUT HERE ==== > > > mm: compaction: Avoid potential deadlock for readahead pages and direct compaction > > > > > > Hugh Dickins reported that two instances of cp were locking up when > > > running on ppc64 in a memory constrained environment. The deadlock > > > appears to apply to readahead pages. When reading ahead, the pages are > > > added locked to the LRU and queued for IO. The process is also inserting > > > pages into the page cache and so is calling radix_preload and entering > > > the page allocator. When SLUB is used, this can result in direct > > > compaction finding the page that was just added to the LRU but still > > > locked by the current process leading to deadlock. > > > > > > This patch avoids locking pages for migration that might already be > > > locked by the current process. Ideally it would only apply for direct > > > compaction but compaction does not set PF_MEMALLOC so there is no way > > > currently of identifying a process in direct compaction. A process-flag > > > could be added but is likely to be overkill. > > > > > > Reported-by: Hugh Dickins <hughd@xxxxxxxxxx> > > > Signed-off-by: Mel Gorman <mel@xxxxxxxxx> > > > > But whilst I'm hugely grateful to you for working this out, > > I'm sorry to say that I'm not keen on your patch! > > > > PageMappedToDisk is an fs thing, not an mm thing (migrate.c copies it > > over but that's all), and I don't like to see you rely on it. I expect > > it works well for ext234 and many others that use mpage_readpages, > > but what of btrfs_readpages? I couldn't see any use of PageMappedToDisk > > there. I suppose you could insist it use it too, but... > > > > How about setting and clearing PF_MEMALLOC around the call to > > try_to_compact_pages() in __alloc_pages_direct_compact(), and > > skipping the lock_page when PF_MEMALLOC is set, whatever the > > page flags? That would mimic __alloc_pages_direct_reclaim > > (hmm, reclaim_state??); and I've a suspicion that this readahead > > deadlock may not be the only one lurking. > > > > Hugh > > Hmm, in migrate_pages() > == > int migrate_pages(struct list_head *from, > new_page_t get_new_page, unsigned long private, bool offlining, > bool sync) > { > > ... > for(pass = 0; pass < 10 && retry; pass++) { > retry = 0; > ... > > rc = unmap_and_move(get_new_page, private, > page, pass > 2, offlining, > sync); > > == > > do force locking at pass > 2. Considering direct-compaction, pass > 2 is not > required I think because it can do compaction on other range of pages. We have two phase of direct compaction with async and sync. It can be async patch of compaction. > > IOW, what it requires is a range of pages for specified order, but a range of > pages which is specfied by users. How about skipping pass > 2 when > it's called by direct compaction ? quick-scan of the next range may be helpful > rather than waiting on lock. If async migration fail to get a free big order, it try to compact with sync. In that case, waitting of locked page makes sense to me. So unconditional skip in pass > 2 isn't good. I vote Hugh's idea. It would just skip if only current context hold lock. > > Thanks, > -Kame > > > > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxxx For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>