On 01/15/2015 08:21 PM, Sasha Levin wrote: > On 01/15/2015 04:00 PM, Dave Hansen wrote: >> I/O devices are only getting faster. In fact, they're getting closer >> and closer to memory in latency and bandwidth. But the VM is still >> designed to do very orderly and costly procedures to reclaim memory, and >> the existing algorithms don't parallelize particularly well. They hit >> contention on mmap_sem or the lru locks well before all of the CPU >> horsepower that we have can be brought to bear on reclaim. >> >> Once the latency to bring pages in and out of storage becomes low >> enough, reclaiming the _right_ pages becomes much less important than >> doing something useful with the CPU horsepower that we have. >> >> We need to talk about ways to do reclaim with lower CPU overhead and to >> parallelize more effectively. >> >> There has been some research in this area by some folks at Intel and we >> could quickly summarize what has been learned so far to help kick off a >> discussion. > > I was actually planning to bring that up. Trinity can cause enough stress > to a system that the hang watchdog triggers (with a 10 minute timeout!) > inside reclaim code. Something like this: [ 5412.398971] trinity-c23 R running task 10768 29063 30295 0x10000006 [ 5412.400111] ffff88046127b178 0000000000000ec3 0000000000000000 0000000000000000 [ 5412.401404] ffffe8fff263cd6e 0000000000000000 0000000000000000 0000000000000000 [ 5412.402695] 0000000000000000 ffff88076bfff4c8 0000000000001099 0000000000000000 [ 5412.404084] Call Trace: [ 5412.404495] [<ffffffff916124f2>] ? _raw_spin_unlock_irqrestore+0xa2/0xf0 [ 5412.405611] [<ffffffff915fbe25>] schedule+0x55/0x280 [ 5412.406435] [<ffffffff81903e72>] throttle_direct_reclaim+0x432/0x660 [ 5412.407530] [<ffffffff81552160>] ? __init_waitqueue_head+0xe0/0xe0 [ 5412.408565] [<ffffffff81916007>] try_to_free_pages+0xf7/0x460 [ 5412.409528] [<ffffffff818e25e1>] __alloc_pages_nodemask+0xc01/0x1940 [ 5412.410589] [<ffffffff81a05646>] alloc_pages_vma+0x216/0x6f0 [ 5412.411572] [<ffffffff8191a62a>] ? shmem_alloc_page+0x9a/0x170 [ 5412.412556] [<ffffffff8191a62a>] shmem_alloc_page+0x9a/0x170 [ 5412.413507] [<ffffffff818bc241>] ? find_get_entry+0x191/0x2c0 [ 5412.414475] [<ffffffff818bc0b5>] ? find_get_entry+0x5/0x2c0 [ 5412.415513] [<ffffffff818bd02b>] ? find_lock_entry+0x2b/0x140 [ 5412.416474] [<ffffffff81924f16>] shmem_getpage_gfp+0xde6/0x1710 [ 5412.417461] [<ffffffff81926eb1>] shmem_fault+0x1a1/0x7d0 [ 5412.418353] [<ffffffff819745dd>] __do_fault+0xad/0x2a0 [ 5412.419285] [<ffffffff8197e6c1>] handle_mm_fault+0x1331/0x5440 [ 5412.420121] [<ffffffff812f59d3>] __do_page_fault+0x2d3/0xfb0 [ 5412.420877] [<ffffffff81579bc7>] ? mark_held_locks+0x117/0x2b0 [ 5412.421688] [<ffffffff81571fcd>] ? trace_hardirqs_off+0xd/0x10 [ 5412.422491] [<ffffffff812f6828>] trace_do_page_fault+0xc8/0x420 [ 5412.423348] [<ffffffff812da7d3>] do_async_page_fault+0x83/0x120 [ 5412.424175] [<ffffffff91614d68>] async_page_fault+0x28/0x30 [ 5412.424920] [<ffffffff81960e5e>] ? iov_iter_fault_in_readable+0x17e/0x280 [ 5412.425851] [<ffffffff814a1045>] ? ___might_sleep+0x2a5/0x420 [ 5412.426645] [<ffffffff818baa49>] generic_perform_write+0x179/0x5b0 [ 5412.427573] [<ffffffff81b2b267>] ? __mnt_drop_write+0x57/0xa0 [ 5412.428365] [<ffffffff818c23fc>] __generic_file_write_iter+0x59c/0x13e0 [ 5412.429292] [<ffffffff81aa1c7c>] ? rw_copy_check_uvector+0x5c/0x470 [ 5412.430153] [<ffffffff81a8f964>] ? kasan_poison_shadow+0x34/0x40 [ 5412.431062] [<ffffffff818c3319>] generic_file_write_iter+0xd9/0x510 [ 5412.431892] [<ffffffff81a9bec0>] ? new_sync_read+0x220/0x220 [ 5412.432675] [<ffffffff81a9c17d>] do_iter_readv_writev+0x9d/0x190 [ 5412.433504] [<ffffffff81aa22c9>] do_readv_writev+0x239/0xe10 [ 5412.434294] [<ffffffff818c3240>] ? __generic_file_write_iter+0x13e0/0x13e0 [ 5412.435300] [<ffffffff8174cb7e>] ? acct_account_cputime+0x6e/0xa0 [ 5412.435942] [<ffffffff818c3240>] ? __generic_file_write_iter+0x13e0/0x13e0 [ 5412.436678] [<ffffffff818b5fa7>] ? context_tracking_user_exit+0xc7/0x330 [ 5412.437395] [<ffffffff8157a281>] ? trace_hardirqs_on_caller+0x521/0x850 [ 5412.438097] [<ffffffff81aa3033>] vfs_writev+0x93/0x100 [ 5412.438632] [<ffffffff81aa39ba>] SyS_pwritev+0x11a/0x200 Thanks, Sasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>