[AMD Official Use Only - General] > -----Original Message----- > From: Chai, Thomas <YiPeng.Chai@xxxxxxx> > Sent: Sunday, April 7, 2024 10:21 AM > To: Zhou1, Tao <Tao.Zhou1@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Li, Candice > <Candice.Li@xxxxxxx>; Wang, Yang(Kevin) <KevinYang.Wang@xxxxxxx>; Yang, > Stanley <Stanley.Yang@xxxxxxx> > Subject: RE: [PATCH] drm/amdgpu: Fix incorrect return value > > [AMD Official Use Only - General] > > ----------------- > Best Regards, > Thomas > > -----Original Message----- > From: Zhou1, Tao <Tao.Zhou1@xxxxxxx> > Sent: Wednesday, April 3, 2024 6:36 PM > To: Chai, Thomas <YiPeng.Chai@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Li, Candice > <Candice.Li@xxxxxxx>; Wang, Yang(Kevin) <KevinYang.Wang@xxxxxxx>; Yang, > Stanley <Stanley.Yang@xxxxxxx> > Subject: RE: [PATCH] drm/amdgpu: Fix incorrect return value > > [AMD Official Use Only - General] > > > -----Original Message----- > > From: Chai, Thomas <YiPeng.Chai@xxxxxxx> > > Sent: Wednesday, April 3, 2024 3:07 PM > > To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx > > Cc: Chai, Thomas <YiPeng.Chai@xxxxxxx>; Zhang, Hawking > > <Hawking.Zhang@xxxxxxx>; Zhou1, Tao <Tao.Zhou1@xxxxxxx>; Li, Candice > > <Candice.Li@xxxxxxx>; Wang, Yang(Kevin) <KevinYang.Wang@xxxxxxx>; > > Yang, Stanley <Stanley.Yang@xxxxxxx>; Chai, Thomas > > <YiPeng.Chai@xxxxxxx> > > Subject: [PATCH] drm/amdgpu: Fix incorrect return value > > > > [Why] > > After calling amdgpu_vram_mgr_reserve_range multiple times with the > > same address, calling amdgpu_vram_mgr_query_page_status will always > > return - EBUSY. > > >[Tao] could you explain why we call amdgpu_vram_mgr_reserve_range multiple > times with the same address? IIRC, we skip duplicate address before reserve > memory. > > [Thomas] > When poison creation interrupt is received, since some poisoning addresses may > have been allocated by some processes, reserving these memories will fail. > These memory will be tried to reserve again after killing the poisoned process in > the subsequent poisoning consumption interrupt handler. > so amdgpu_vram_mgr_reserve_range needs to be called multiple times with the > same address. > > > From the second call to amdgpu_vram_mgr_reserve_range, the same > > address will be added to the reservations_pending list again and is > > never moved to the reserved_pages list because the address had been > reserved. [Tao] but if a page is added to reservations_pending list, it should also be put in data->bps array, and when we call amdgpu_ras_add_bad_pages again, amdgpu_ras_check_bad_page_unlock could ignore this page. So except for amdgpu_ras_add_bad_pages, would you like to call amdgpu_vram_mgr_reserve_range in other place? > > > > [How] > > First add the address status check before calling > > amdgpu_vram_mgr_do_reserve, if the address is already reserved, do > > nothing; If the address is already in the reservations_pending list, > > directly reserve memory; only add new nodes for the addresses that are > > not in the reserved_pages list and reservations_pending list. > > > > Signed-off-by: YiPeng Chai <YiPeng.Chai@xxxxxxx> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 28 > > +++++++++++++------- > > 1 file changed, 19 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c > > index 1e36c428d254..0bf3f4092900 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c > > @@ -317,7 +317,6 @@ static void amdgpu_vram_mgr_do_reserve(struct > > ttm_resource_manager *man) > > > > dev_dbg(adev->dev, "Reservation 0x%llx - %lld, Succeeded\n", > > rsv->start, rsv->size); > > - > > vis_usage = amdgpu_vram_mgr_vis_size(adev, block); > > atomic64_add(vis_usage, &mgr->vis_usage); > > spin_lock(&man->bdev->lru_lock); @@ -340,19 +339,30 @@ > > int amdgpu_vram_mgr_reserve_range(struct > > amdgpu_vram_mgr *mgr, > > uint64_t start, uint64_t size) { > > struct amdgpu_vram_reservation *rsv; > > + int ret = 0; > > > > - rsv = kzalloc(sizeof(*rsv), GFP_KERNEL); > > - if (!rsv) > > - return -ENOMEM; > > + ret = amdgpu_vram_mgr_query_page_status(mgr, start); > > + if (!ret) > > + return 0; > > + > > + if (ret == -ENOENT) { > > + rsv = kzalloc(sizeof(*rsv), GFP_KERNEL); > > + if (!rsv) > > + return -ENOMEM; > > > > - INIT_LIST_HEAD(&rsv->allocated); > > - INIT_LIST_HEAD(&rsv->blocks); > > + INIT_LIST_HEAD(&rsv->allocated); > > + INIT_LIST_HEAD(&rsv->blocks); > > > > - rsv->start = start; > > - rsv->size = size; > > + rsv->start = start; > > + rsv->size = size; > > + > > + mutex_lock(&mgr->lock); > > + list_add_tail(&rsv->blocks, &mgr->reservations_pending); > > + mutex_unlock(&mgr->lock); > > + > > + } > > > > mutex_lock(&mgr->lock); > > - list_add_tail(&rsv->blocks, &mgr->reservations_pending); > > amdgpu_vram_mgr_do_reserve(&mgr->manager); > > mutex_unlock(&mgr->lock); > > > > -- > > 2.34.1 > >