Re: [HMM 12/15] mm/migrate: new memory migration helper for use with device memory v4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/30/17 5:57 PM, Jerome Glisse wrote:

...

Hi Jerome,

I am working on a sporadic data corruption seen in highly contented use cases. So far, I've been able to re-create a sporadic hang that happens when multiple threads compete to migrate the same page to and from device memory. The reproducer uses only the dummy driver from hmm-next.

Please find attached. This is how it hangs on my 12-core Intel i7-5930K SMT system:

&&& 2 migrate threads, 2 read threads: STARTING
(EE:84) hmm_buffer_mirror_read error -1
&&& 2 migrate threads, 2 read threads: PASSED
&&& 2 migrate threads, 3 read threads: STARTING
&&& 2 migrate threads, 3 read threads: PASSED
&&& 2 migrate threads, 4 read threads: STARTING
&&& 2 migrate threads, 4 read threads: PASSED
&&& 3 migrate threads, 2 read threads: STARTING

The kernel log (also attached) shows multiple threads blocked in hmm_vma_fault() and migrate_vma():

[  139.054907] sanity_rmem004  D13528  3997   3818 0x00000000
[  139.054912] Call Trace:
[  139.054914]  __schedule+0x20b/0x6c0
[  139.054916]  schedule+0x36/0x80
[  139.054920]  io_schedule+0x16/0x40
[  139.054923]  __lock_page+0xf2/0x130
[  139.054929]  migrate_vma+0x48a/0xee0
[  139.054933]  dummy_migrate.isra.10+0xd9/0x110 [hmm_dmirror]
[  139.054945]  dummy_fops_unlocked_ioctl+0x1e8/0x330 [hmm_dmirror]
[  139.054954]  do_vfs_ioctl+0x96/0x5a0
[  139.054957]  SyS_ioctl+0x79/0x90
[  139.054960]  entry_SYSCALL_64_fastpath+0x13/0x94
...
[  139.055067] sanity_rmem004  D13136  3999   3818 0x00000000
[  139.055072] Call Trace:
[  139.055074]  __schedule+0x20b/0x6c0
[  139.055076]  schedule+0x36/0x80
[  139.055079]  io_schedule+0x16/0x40
[  139.055083]  wait_on_page_bit+0xee/0x120
[  139.055089]  __migration_entry_wait+0xe8/0x190
[  139.055091]  migration_entry_wait+0x5f/0x70
[  139.055094]  do_swap_page+0x4c7/0x4e0
[  139.055096]  __handle_mm_fault+0x347/0x9d0
[  139.055099]  handle_mm_fault+0x88/0x150
[  139.055103]  hmm_vma_walk_clear+0x8f/0xd0
[  139.055105]  hmm_vma_walk_pmd+0x1ba/0x250
[  139.055109]  __walk_page_range+0x1e8/0x420
[  139.055112]  walk_page_range+0x73/0xf0
[  139.055114]  hmm_vma_fault+0x180/0x260
[  139.055121]  dummy_fault+0xda/0x1f0 [hmm_dmirror]
[  139.055138]  dummy_fops_unlocked_ioctl+0x12c/0x330 [hmm_dmirror]
[  139.055142]  do_vfs_ioctl+0x96/0x5a0
[  139.055145]  SyS_ioctl+0x79/0x90
[  139.055148]  entry_SYSCALL_64_fastpath+0x13/0x94

Please compile and run the attached program this way:

$ ./build.sh
$ sudo ./kload.sh
$ sudo ./run.sh

Thanks!

Evgeny Baskakov
NVIDIA

Attachment: sanity_rmem004_repeated_faults_threaded.tgz
Description: GNU Zip compressed data

[  107.703099] hmm_dmirror loaded THIS IS A DANGEROUS MODULE !!!
[  114.236862] DEVICE PAGE 20400 20400 (0)
[  114.845967] DEVICE PAGE 53323 53323 (0)
[  115.536446] DEVICE PAGE 87401 87401 (0)
[  139.054579] sysrq: SysRq : Show Blocked State
[  139.054658]   task                        PC stack   pid father
[  139.054661] rcu_sched       D15024     8      2 0x00000000
[  139.054669] Call Trace:
[  139.054676]  __schedule+0x20b/0x6c0
[  139.054679]  schedule+0x36/0x80
[  139.054687]  rcu_gp_kthread+0x74/0x770
[  139.054693]  kthread+0x109/0x140
[  139.054697]  ? force_qs_rnp+0x180/0x180
[  139.054700]  ? kthread_park+0x60/0x60
[  139.054705]  ret_from_fork+0x22/0x30
[  139.054707] rcu_bh          D15424     9      2 0x00000000
[  139.054713] Call Trace:
[  139.054716]  __schedule+0x20b/0x6c0
[  139.054718]  schedule+0x36/0x80
[  139.054721]  rcu_gp_kthread+0x74/0x770
[  139.054725]  kthread+0x109/0x140
[  139.054728]  ? force_qs_rnp+0x180/0x180
[  139.054731]  ? kthread_park+0x60/0x60
[  139.054734]  ret_from_fork+0x22/0x30
[  139.054762] sanity_rmem004  D13528  3995   3818 0x00000000
[  139.054767] Call Trace:
[  139.054769]  __schedule+0x20b/0x6c0
[  139.054776]  ? wake_up_q+0x80/0x80
[  139.054778]  schedule+0x36/0x80
[  139.054782]  io_schedule+0x16/0x40
[  139.054789]  __lock_page+0xf2/0x130
[  139.054792]  ? page_cache_tree_insert+0x90/0x90
[  139.054798]  migrate_vma+0x48a/0xee0
[  139.054803]  dummy_migrate.isra.10+0xd9/0x110 [hmm_dmirror]
[  139.054812]  dummy_fops_unlocked_ioctl+0x1e8/0x330 [hmm_dmirror]
[  139.054814]  ? _cond_resched+0x19/0x30
[  139.054819]  ? selinux_file_ioctl+0x114/0x1e0
[  139.054823]  do_vfs_ioctl+0x96/0x5a0
[  139.054826]  SyS_ioctl+0x79/0x90
[  139.054830]  entry_SYSCALL_64_fastpath+0x13/0x94
[  139.054832] RIP: 0033:0x7fc07add61e7
[  139.054834] RSP: 002b:00007fc078cdfd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  139.054836] RAX: ffffffffffffffda RBX: 00007fc078cdfdf0 RCX: 00007fc07add61e7
[  139.054837] RDX: 00007fc078cdfdf0 RSI: 00000000c0104802 RDI: 0000000000000003
[  139.054839] RBP: 00007ffde7ffd470 R08: 0000000000000000 R09: 00007fc078ce0700
[  139.054840] R10: 00007fc078ce09d0 R11: 0000000000000246 R12: 00007fc078cdfdf8
[  139.054841] R13: 0000000000000000 R14: 0000000000000010 R15: 00007fc078ce0700
[  139.054843] sanity_rmem004  D13304  3996   3818 0x00000000
[  139.054848] Call Trace:
[  139.054851]  __schedule+0x20b/0x6c0
[  139.054853]  schedule+0x36/0x80
[  139.054856]  io_schedule+0x16/0x40
[  139.054860]  __lock_page+0xf2/0x130
[  139.054863]  ? page_cache_tree_insert+0x90/0x90
[  139.054866]  migrate_vma+0x48a/0xee0
[  139.054870]  dummy_migrate.isra.10+0xd9/0x110 [hmm_dmirror]
[  139.054877]  ? avc_has_extended_perms+0xda/0x480
[  139.054881]  dummy_fops_unlocked_ioctl+0x1e8/0x330 [hmm_dmirror]
[  139.054883]  ? _cond_resched+0x19/0x30
[  139.054887]  ? selinux_file_ioctl+0x114/0x1e0
[  139.054890]  do_vfs_ioctl+0x96/0x5a0
[  139.054893]  SyS_ioctl+0x79/0x90
[  139.054896]  entry_SYSCALL_64_fastpath+0x13/0x94
[  139.054897] RIP: 0033:0x7fc07add61e7
[  139.054898] RSP: 002b:00007fc0794e0d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  139.054901] RAX: ffffffffffffffda RBX: 00007fc0794e0df0 RCX: 00007fc07add61e7
[  139.054902] RDX: 00007fc0794e0df0 RSI: 00000000c0104802 RDI: 0000000000000003
[  139.054903] RBP: 00007ffde7ffd470 R08: 0000000000000000 R09: 00007fc0794e1700
[  139.054904] R10: 00007fc0794e19d0 R11: 0000000000000246 R12: 00007fc0794e0df8
[  139.054905] R13: 0000000000000000 R14: 0000000000000010 R15: 00007fc0794e1700
[  139.054907] sanity_rmem004  D13528  3997   3818 0x00000000
[  139.054912] Call Trace:
[  139.054914]  __schedule+0x20b/0x6c0
[  139.054916]  schedule+0x36/0x80
[  139.054920]  io_schedule+0x16/0x40
[  139.054923]  __lock_page+0xf2/0x130
[  139.054927]  ? page_cache_tree_insert+0x90/0x90
[  139.054929]  migrate_vma+0x48a/0xee0
[  139.054933]  dummy_migrate.isra.10+0xd9/0x110 [hmm_dmirror]
[  139.054942]  ? copy_user_enhanced_fast_string+0x7/0x10
[  139.054945]  dummy_fops_unlocked_ioctl+0x1e8/0x330 [hmm_dmirror]
[  139.054947]  ? _cond_resched+0x19/0x30
[  139.054951]  ? selinux_file_ioctl+0x114/0x1e0
[  139.054954]  do_vfs_ioctl+0x96/0x5a0
[  139.054957]  SyS_ioctl+0x79/0x90
[  139.054960]  entry_SYSCALL_64_fastpath+0x13/0x94
[  139.054961] RIP: 0033:0x7fc07add61e7
[  139.054962] RSP: 002b:00007fc079ce1d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  139.054964] RAX: ffffffffffffffda RBX: 00007fc079ce1df0 RCX: 00007fc07add61e7
[  139.054965] RDX: 00007fc079ce1df0 RSI: 00000000c0104802 RDI: 0000000000000003
[  139.054966] RBP: 00007ffde7ffd470 R08: 0000000000000000 R09: 00007fc079ce2700
[  139.054968] R10: 00007fc079ce29d0 R11: 0000000000000246 R12: 00007fc079ce1df8
[  139.054969] R13: 0000000000000000 R14: 0000000000000010 R15: 00007fc079ce2700
[  139.054971] sanity_rmem004  D13136  3998   3818 0x00000000
[  139.054975] Call Trace:
[  139.054977]  __schedule+0x20b/0x6c0
[  139.054979]  schedule+0x36/0x80
[  139.054983]  io_schedule+0x16/0x40
[  139.054986]  wait_on_page_bit+0xee/0x120
[  139.054990]  ? page_cache_tree_insert+0x90/0x90
[  139.054993]  __migration_entry_wait+0xe8/0x190
[  139.054995]  migration_entry_wait+0x5f/0x70
[  139.054998]  do_swap_page+0x4c7/0x4e0
[  139.055001]  __handle_mm_fault+0x347/0x9d0
[  139.055004]  handle_mm_fault+0x88/0x150
[  139.055008]  hmm_vma_walk_clear+0x8f/0xd0
[  139.055010]  hmm_vma_walk_pmd+0x1ba/0x250
[  139.055015]  __walk_page_range+0x1e8/0x420
[  139.055018]  walk_page_range+0x73/0xf0
[  139.055020]  hmm_vma_fault+0x180/0x260
[  139.055023]  ? hmm_vma_walk_hole+0xd0/0xd0
[  139.055024]  ? hmm_vma_get_pfns+0x1b0/0x1b0
[  139.055028]  dummy_fault+0xda/0x1f0 [hmm_dmirror]
[  139.055033]  ? __kernel_map_pages+0x70/0xe0
[  139.055038]  ? __alloc_pages_nodemask+0x11b/0x240
[  139.055041]  ? dummy_pt_walk+0x209/0x2f0 [hmm_dmirror]
[  139.055044]  ? dummy_update+0x60/0x60 [hmm_dmirror]
[  139.055047]  dummy_fops_unlocked_ioctl+0x12c/0x330 [hmm_dmirror]
[  139.055050]  do_vfs_ioctl+0x96/0x5a0
[  139.055054]  SyS_ioctl+0x79/0x90
[  139.055057]  entry_SYSCALL_64_fastpath+0x13/0x94
[  139.055058] RIP: 0033:0x7fc07add61e7
[  139.055059] RSP: 002b:00007fc07ace3c38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  139.055061] RAX: ffffffffffffffda RBX: 00007fc07ace4700 RCX: 00007fc07add61e7
[  139.055062] RDX: 00007fc07ace3cd0 RSI: 00000000c0284800 RDI: 0000000000000003
[  139.055063] RBP: 00007ffde7ffd3b0 R08: 00007fc07ace3ef0 R09: 00007fc07ace4700
[  139.055064] R10: 00007ffde7ffd470 R11: 0000000000000246 R12: 0000000000000000
[  139.055066] R13: 0000000000000000 R14: 00007fc07ace49c0 R15: 00007fc07ace4700
[  139.055067] sanity_rmem004  D13136  3999   3818 0x00000000
[  139.055072] Call Trace:
[  139.055074]  __schedule+0x20b/0x6c0
[  139.055076]  schedule+0x36/0x80
[  139.055079]  io_schedule+0x16/0x40
[  139.055083]  wait_on_page_bit+0xee/0x120
[  139.055086]  ? page_cache_tree_insert+0x90/0x90
[  139.055089]  __migration_entry_wait+0xe8/0x190
[  139.055091]  migration_entry_wait+0x5f/0x70
[  139.055094]  do_swap_page+0x4c7/0x4e0
[  139.055096]  __handle_mm_fault+0x347/0x9d0
[  139.055099]  handle_mm_fault+0x88/0x150
[  139.055103]  hmm_vma_walk_clear+0x8f/0xd0
[  139.055105]  hmm_vma_walk_pmd+0x1ba/0x250
[  139.055109]  __walk_page_range+0x1e8/0x420
[  139.055112]  walk_page_range+0x73/0xf0
[  139.055114]  hmm_vma_fault+0x180/0x260
[  139.055116]  ? hmm_vma_walk_hole+0xd0/0xd0
[  139.055118]  ? hmm_vma_get_pfns+0x1b0/0x1b0
[  139.055121]  dummy_fault+0xda/0x1f0 [hmm_dmirror]
[  139.055125]  ? __kernel_map_pages+0x70/0xe0
[  139.055129]  ? __alloc_pages_nodemask+0x11b/0x240
[  139.055133]  ? dummy_pt_walk+0x209/0x2f0 [hmm_dmirror]
[  139.055135]  ? dummy_update+0x60/0x60 [hmm_dmirror]
[  139.055138]  dummy_fops_unlocked_ioctl+0x12c/0x330 [hmm_dmirror]
[  139.055142]  do_vfs_ioctl+0x96/0x5a0
[  139.055145]  SyS_ioctl+0x79/0x90
[  139.055148]  entry_SYSCALL_64_fastpath+0x13/0x94
[  139.055149] RIP: 0033:0x7fc07add61e7
[  139.055150] RSP: 002b:00007fc07a4e2c38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  139.055152] RAX: ffffffffffffffda RBX: 00007fc07a4e3700 RCX: 00007fc07add61e7
[  139.055153] RDX: 00007fc07a4e2cd0 RSI: 00000000c0284800 RDI: 0000000000000003
[  139.055154] RBP: 00007ffde7ffd3b0 R08: 00007fc07a4e2ef0 R09: 00007fc07a4e3700
[  139.055155] R10: 00007ffde7ffd470 R11: 0000000000000246 R12: 0000000000000000
[  139.055156] R13: 0000000000000000 R14: 00007fc07a4e39c0 R15: 00007fc07a4e3700

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux