On 7/13/17 1:16 PM, Jerome Glisse wrote:
...
Hi Jerome,I have hit another kind of hang. Briefly, if a not yet allocated page faults on CPU during migration to device memory, any subsequent migration will fail for such page. Such a situation can trigger if a CPU page fault happens just immediately after migrate_vma() starts unmapping pages to migrate.
Please find attached a reproducer based on the sample driver. In the hmm_test() function, an HMM_DMIRROR_MIGRATE request is triggered from a separate thread for not yet allocated pages (coming from malloc). In the same time, a HMM_DMIRROR_READ request is made for the same pages. This results in a sporadic app-side hang, because random number of pages never migrate to device memory.
Note that if the pages are touched (initialized with data) prior to that, everything works as expected: all HMM_DMIRROR_READ and HMM_DMIRROR_MIGRATE requests eventually succeed. See comments in the hmm_test() function.
Thanks! -- Evgeny Baskakov NVIDIA
Attachment:
sanity_rmem004_repeated_faults_threaded_notallocated.tgz
Description: GNU Zip compressed data