Rebase to https://github.com/jgunthorpe/linux.git hmm branch, need some changes because of interface hmm_range_register change. Then run a quick amdgpu_test. Test is finished, result is ok. But there is below kernel BUG message, seems hmm_free_rcu calls down_write..... [ 1171.919921] BUG: sleeping function called from invalid context at /home/yangp/git/compute_staging/kernel/kernel/locking/rwsem.c:65 [ 1171.919933] in_atomic(): 1, irqs_disabled(): 0, pid: 53, name: kworker/1:1 [ 1171.919938] 2 locks held by kworker/1:1/53: [ 1171.919940] #0: 000000001c7c19d4 ((wq_completion)rcu_gp){+.+.}, at: process_one_work+0x20e/0x630 [ 1171.919951] #1: 00000000923f2cfa ((work_completion)(&sdp->work)){+.+.}, at: process_one_work+0x20e/0x630 [ 1171.919959] CPU: 1 PID: 53 Comm: kworker/1:1 Tainted: G W 5.2.0-rc1-kfd-yangp #196 [ 1171.919961] Hardware name: ASUS All Series/Z97-PRO(Wi-Fi ac)/USB 3.1, BIOS 9001 03/07/2016 [ 1171.919965] Workqueue: rcu_gp srcu_invoke_callbacks [ 1171.919968] Call Trace: [ 1171.919974] dump_stack+0x67/0x9b [ 1171.919980] ___might_sleep+0x149/0x230 [ 1171.919985] down_write+0x1c/0x70 [ 1171.919989] hmm_free_rcu+0x24/0x80 [ 1171.919993] srcu_invoke_callbacks+0xc9/0x150 [ 1171.920000] process_one_work+0x28e/0x630 [ 1171.920008] worker_thread+0x39/0x3f0 [ 1171.920014] ? process_one_work+0x630/0x630 [ 1171.920017] kthread+0x11c/0x140 [ 1171.920021] ? kthread_park+0x90/0x90 [ 1171.920026] ret_from_fork+0x24/0x30 Philip On 2019-06-12 1:54 p.m., Kuehling, Felix wrote: > [+Philip] > > Hi Jason, > > I'm out of the office this week. > > Hi Philip, can you give this a go? Not sure how much you've been > following this patch series review. Message or call me on Skype to > discuss any questions. > > Thanks, > Felix > > On 2019-06-11 12:48, Jason Gunthorpe wrote: >> On Thu, Jun 06, 2019 at 03:44:27PM -0300, Jason Gunthorpe wrote: >>> From: Jason Gunthorpe <jgg@xxxxxxxxxxxx> >>> >>> For hmm.git: >>> >>> This patch series arised out of discussions with Jerome when looking at the >>> ODP changes, particularly informed by use after free races we have already >>> found and fixed in the ODP code (thanks to syzkaller) working with mmu >>> notifiers, and the discussion with Ralph on how to resolve the lifetime model. >>> >>> Overall this brings in a simplified locking scheme and easy to explain >>> lifetime model: >>> >>> If a hmm_range is valid, then the hmm is valid, if a hmm is valid then the mm >>> is allocated memory. >>> >>> If the mm needs to still be alive (ie to lock the mmap_sem, find a vma, etc) >>> then the mmget must be obtained via mmget_not_zero(). >>> >>> Locking of mm->hmm is shifted to use the mmap_sem consistently for all >>> read/write and unlocked accesses are removed. >>> >>> The use unlocked reads on 'hmm->dead' are also eliminated in favour of using >>> standard mmget() locking to prevent the mm from being released. Many of the >>> debugging checks of !range->hmm and !hmm->mm are dropped in favour of poison - >>> which is much clearer as to the lifetime intent. >>> >>> The trailing patches are just some random cleanups I noticed when reviewing >>> this code. >>> >>> This v2 incorporates alot of the good off list changes & feedback Jerome had, >>> and all the on-list comments too. However, now that we have the shared git I >>> have kept the one line change to nouveau_svm.c rather than the compat >>> funtions. >>> >>> I believe we can resolve this merge in the DRM tree now and keep the core >>> mm/hmm.c clean. DRM maintainers, please correct me if I'm wrong. >>> >>> It is on top of hmm.git, and I have a git tree of this series to ease testing >>> here: >>> >>> https://github.com/jgunthorpe/linux/tree/hmm >>> >>> There are still some open locking issues, as I think this remains unaddressed: >>> >>> https://lore.kernel.org/linux-mm/20190527195829.GB18019@xxxxxxxxxxxx/ >>> >>> I'm looking for some more acks, reviews and tests so this can move ahead to >>> hmm.git. >> AMD Folks, this is looking pretty good now, can you please give at >> least a Tested-by for the new driver code using this that I see in >> linux-next? >> >> Thanks, >> Jason