Cool, thanks! On Mon, 14 May 2018, 19:07 Felix Kuehling <felix.kuehling at amd.com> wrote: > On 2018-05-11 03:59 AM, Oded Gabbay wrote: > > On Fri, Mar 23, 2018 at 10:32 PM, Felix Kuehling <Felix.Kuehling at amd.com> > wrote: > >> When an MMU notifier runs in memory reclaim context, it can deadlock > >> trying to take locks that are already held in the thread causing the > >> memory reclaim. The solution is to avoid memory reclaim while holding > >> locks that are taken in MMU notifiers by using GFP_NOIO. > > Which locks are problematic ? > > The only lock I need to take in our MMU notifier is the DQM lock. > > > > > The kernel recommendation is to use "memalloc_noio_{save,restore} to > > mark the whole scope which cannot perform any IO with a short > > explanation why" > > Yeah. Looking at it more, I think the correct one to use is actually > memalloc_nofs_{save,restore}. > > > > > By using the scope functions, you protect against future allocation > > code that will be written in the critical path, without worrying about > > the developer using the correct GFP_NOIO flag. > > Yes. Last time I looked into this it was broken and didn't properly > handle kmalloc allocations. It looks like this was fixed by this commit: > > commit 6d7225f0cc1a1fc32cf5dd01b4ab4b8a34c7cdb4 > Author: Nikolay Borisov <nborisov at suse.com> > Date: Wed May 3 14:53:05 2017 -0700 > > lockdep: teach lockdep about memalloc_noio_save > > > Later NOFS was introduced, which is now used by the lockdep checker to > detect reclaim deadlocks. > > Regards, > Felix > > > > > Oded > > > >> This commit fixes memory allocations done while holding the dqm->lock > >> which is needed in the MMU notifier (dqm->ops.evict_process_queues). > >> > >> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com> > >> --- > >> drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +- > >> drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c | 2 +- > >> drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c | 2 +- > >> 3 files changed, 3 insertions(+), 3 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device.c > >> index 334669996..0434f65 100644 > >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c > >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c > >> @@ -652,7 +652,7 @@ int kfd_gtt_sa_allocate(struct kfd_dev *kfd, > unsigned int size, > >> if (size > kfd->gtt_sa_num_of_chunks * kfd->gtt_sa_chunk_size) > >> return -ENOMEM; > >> > >> - *mem_obj = kmalloc(sizeof(struct kfd_mem_obj), GFP_KERNEL); > >> + *mem_obj = kmalloc(sizeof(struct kfd_mem_obj), GFP_NOIO); > >> if ((*mem_obj) == NULL) > >> return -ENOMEM; > >> > >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c > b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c > >> index c00c325..2bc49c6 100644 > >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c > >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c > >> @@ -412,7 +412,7 @@ struct mqd_manager *mqd_manager_init_cik(enum > KFD_MQD_TYPE type, > >> if (WARN_ON(type >= KFD_MQD_TYPE_MAX)) > >> return NULL; > >> > >> - mqd = kzalloc(sizeof(*mqd), GFP_KERNEL); > >> + mqd = kzalloc(sizeof(*mqd), GFP_NOIO); > >> if (!mqd) > >> return NULL; > >> > >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c > b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c > >> index 89e4242..481307b 100644 > >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c > >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c > >> @@ -394,7 +394,7 @@ struct mqd_manager *mqd_manager_init_vi(enum > KFD_MQD_TYPE type, > >> if (WARN_ON(type >= KFD_MQD_TYPE_MAX)) > >> return NULL; > >> > >> - mqd = kzalloc(sizeof(*mqd), GFP_KERNEL); > >> + mqd = kzalloc(sizeof(*mqd), GFP_NOIO); > >> if (!mqd) > >> return NULL; > >> > >> -- > >> 2.7.4 > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180514/32ea2a7f/attachment-0001.html>