I notice there's a Bug 31142 "Large write to USB stick freezes" discussion happening (which I've not digested), for which Andrea is proposing a patch which reminds me of this one. Thought I'd better throw this into the mix for consideration. I'd not sent it in yet, because I only see the problem on one machine, and then only with a shmem patch I'm working up; but can't see how that patch would actually be necessary to create the problem. It happens in my extfs-on-loop-on-tmpfs swapping tests, when copying in the kernel tree. I believe the relevant traces are these three: I notice sync_supers there every time it hangs, but I guess it comes along after, and gets stuck on the same page which cp is waiting for. D sync_supers: schedule +0x670 io_schedule +0x50 sync_buffer +0x68 __wait_on_bit +0x90 out_of_line_wait_on_bit +0x98 __wait_on_buffer +0x30 __sync_dirty_buffer +0xc0 ext4_commit_super +0x2c4 ext4_write_super +0x28 sync_supers +0xdc bdi_sync_supers +0x40 kthread +0xac kernel_thread +0x54 D loop0: schedule +0x670 io_schedule +0x50 sync_page +0x84 __wait_on_bit +0x90 wait_on_page_bit +0xa4 unmap_and_move +0x180 migrate_pages +0xbc compact_zone +0xbc compact_zone_order +0xc8 try_to_compact_pages +0x104 __alloc_pages_direct_compact +0xc0 __alloc_pages_nodemask +0x68c allocate_slab +0x84 new_slab +0x58 __slab_alloc +0x1ec kmem_cache_alloc +0x7c radix_tree_preload +0x94 add_to_page_cache_locked +0x78 shmem_getpage +0x208 pagecache_write_begin +0x2c do_lo_send_aops +0xc0 do_bio_filebacked +0x11c loop_thread +0x204 kthread +0xac kernel_thread +0x54 D cp: schedule +0x670 io_schedule +0x50 sync_buffer +0x68 __wait_on_bit +0x90 out_of_line_wait_on_bit +0x98 __wait_on_buffer +0x30 ext4_find_entry +0x230 ext4_lookup +0x44 d_alloc_and_lookup +0x74 do_last +0xe0 do_filp_open +0x2b8 do_sys_open +0x8c compat_sys_open +0x24 syscall_exit +0x0 I believe (but haven't verified for sure) that what happens is that compaction (when trying to allocate a radix_tree node - SLUB asks for order 2 - in the loop0 daemon trace) chooses the cp page under writeback which is waiting for loop0 to write it. So I've extended your earlier PF_MEMALLOC patch to prevent waiting for writeback as well as waiting for pagelock. And I've never seen the hang again since putting this patch in. Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> --- mm/migrate.c | 38 +++++++++++++++++++++----------------- 1 file changed, 21 insertions(+), 17 deletions(-) --- 2.6.38/mm/migrate.c 2011-03-14 18:20:32.000000000 -0700 +++ linux/mm/migrate.c 2011-03-15 06:36:26.000000000 -0700 @@ -637,29 +637,33 @@ static int unmap_and_move(new_page_t get if (unlikely(split_huge_page(page))) goto move_newpage; + /* + * It's not safe for direct compaction to call lock_page. + * For example, during page readahead pages are added locked + * to the LRU. Later, when the IO completes the pages are + * marked uptodate and unlocked. However, the queueing + * could be merging multiple pages for one bio (e.g. + * mpage_readpages). If an allocation happens for the + * second or third page, the process can end up locking + * the same page twice and deadlocking. Rather than + * trying to be clever about what pages can be locked, + * avoid the use of lock_page for direct compaction + * altogether. + * + * Nor is it safe for direct compaction to wait_on_page_writeback: + * we might be trying to allocate on behalf of that writeback (e.g. + * slub allocating an order-2 page for a radix_tree node for the + * loop device below, might target that very page under writeback). + */ + if (current->flags & PF_MEMALLOC) + force = 0; + /* prepare cgroup just returns 0 or -ENOMEM */ rc = -EAGAIN; if (!trylock_page(page)) { if (!force) goto move_newpage; - - /* - * It's not safe for direct compaction to call lock_page. - * For example, during page readahead pages are added locked - * to the LRU. Later, when the IO completes the pages are - * marked uptodate and unlocked. However, the queueing - * could be merging multiple pages for one bio (e.g. - * mpage_readpages). If an allocation happens for the - * second or third page, the process can end up locking - * the same page twice and deadlocking. Rather than - * trying to be clever about what pages can be locked, - * avoid the use of lock_page for direct compaction - * altogether. - */ - if (current->flags & PF_MEMALLOC) - goto move_newpage; - lock_page(page); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>