RE: [PATCH] drm/amdgpu: Fix incorrect return value

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only - General]

> -----Original Message-----
> From: Chai, Thomas <YiPeng.Chai@xxxxxxx>
> Sent: Sunday, April 7, 2024 10:21 AM
> To: Zhou1, Tao <Tao.Zhou1@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Li, Candice
> <Candice.Li@xxxxxxx>; Wang, Yang(Kevin) <KevinYang.Wang@xxxxxxx>; Yang,
> Stanley <Stanley.Yang@xxxxxxx>
> Subject: RE: [PATCH] drm/amdgpu: Fix incorrect return value
>
> [AMD Official Use Only - General]
>
> -----------------
> Best Regards,
> Thomas
>
> -----Original Message-----
> From: Zhou1, Tao <Tao.Zhou1@xxxxxxx>
> Sent: Wednesday, April 3, 2024 6:36 PM
> To: Chai, Thomas <YiPeng.Chai@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Li, Candice
> <Candice.Li@xxxxxxx>; Wang, Yang(Kevin) <KevinYang.Wang@xxxxxxx>; Yang,
> Stanley <Stanley.Yang@xxxxxxx>
> Subject: RE: [PATCH] drm/amdgpu: Fix incorrect return value
>
> [AMD Official Use Only - General]
>
> > -----Original Message-----
> > From: Chai, Thomas <YiPeng.Chai@xxxxxxx>
> > Sent: Wednesday, April 3, 2024 3:07 PM
> > To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> > Cc: Chai, Thomas <YiPeng.Chai@xxxxxxx>; Zhang, Hawking
> > <Hawking.Zhang@xxxxxxx>; Zhou1, Tao <Tao.Zhou1@xxxxxxx>; Li, Candice
> > <Candice.Li@xxxxxxx>; Wang, Yang(Kevin) <KevinYang.Wang@xxxxxxx>;
> > Yang, Stanley <Stanley.Yang@xxxxxxx>; Chai, Thomas
> > <YiPeng.Chai@xxxxxxx>
> > Subject: [PATCH] drm/amdgpu: Fix incorrect return value
> >
> > [Why]
> >   After calling amdgpu_vram_mgr_reserve_range multiple times with the
> > same address, calling amdgpu_vram_mgr_query_page_status will always
> > return - EBUSY.
>
> >[Tao] could you explain why we call amdgpu_vram_mgr_reserve_range multiple
> times with the same  address? IIRC, we skip duplicate address before reserve
> memory.
>
> [Thomas]
>    When poison creation interrupt is received, since some poisoning addresses may
> have been allocated by some processes, reserving these memories will fail.
> These memory will be tried to reserve again after killing the poisoned process in
> the subsequent poisoning consumption interrupt handler.
> so amdgpu_vram_mgr_reserve_range needs to be called multiple times with the
> same address.
>
> >   From the second call to amdgpu_vram_mgr_reserve_range, the same
> > address will be added to the reservations_pending list again and is
> > never moved to the reserved_pages list because the address had been
> reserved.

[Tao] but if a page is added to reservations_pending list, it should also be put in data->bps array, and when we call amdgpu_ras_add_bad_pages again, amdgpu_ras_check_bad_page_unlock could ignore this page.
So except for amdgpu_ras_add_bad_pages, would you like to call amdgpu_vram_mgr_reserve_range in other place?

> >
> > [How]
> >   First add the address status check before calling
> > amdgpu_vram_mgr_do_reserve, if the address is already reserved, do
> > nothing; If the address is already in the reservations_pending list,
> > directly reserve memory; only add new nodes for the addresses that are
> > not in the reserved_pages list and reservations_pending list.
> >
> > Signed-off-by: YiPeng Chai <YiPeng.Chai@xxxxxxx>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 28
> > +++++++++++++-------
> >  1 file changed, 19 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> > index 1e36c428d254..0bf3f4092900 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> > @@ -317,7 +317,6 @@ static void amdgpu_vram_mgr_do_reserve(struct
> > ttm_resource_manager *man)
> >
> >               dev_dbg(adev->dev, "Reservation 0x%llx - %lld, Succeeded\n",
> >                       rsv->start, rsv->size);
> > -
> >               vis_usage = amdgpu_vram_mgr_vis_size(adev, block);
> >               atomic64_add(vis_usage, &mgr->vis_usage);
> >               spin_lock(&man->bdev->lru_lock); @@ -340,19 +339,30 @@
> > int amdgpu_vram_mgr_reserve_range(struct
> > amdgpu_vram_mgr *mgr,
> >                                 uint64_t start, uint64_t size)  {
> >       struct amdgpu_vram_reservation *rsv;
> > +     int ret = 0;
> >
> > -     rsv = kzalloc(sizeof(*rsv), GFP_KERNEL);
> > -     if (!rsv)
> > -             return -ENOMEM;
> > +     ret = amdgpu_vram_mgr_query_page_status(mgr, start);
> > +     if (!ret)
> > +             return 0;
> > +
> > +     if (ret == -ENOENT) {
> > +             rsv = kzalloc(sizeof(*rsv), GFP_KERNEL);
> > +             if (!rsv)
> > +                     return -ENOMEM;
> >
> > -     INIT_LIST_HEAD(&rsv->allocated);
> > -     INIT_LIST_HEAD(&rsv->blocks);
> > +             INIT_LIST_HEAD(&rsv->allocated);
> > +             INIT_LIST_HEAD(&rsv->blocks);
> >
> > -     rsv->start = start;
> > -     rsv->size = size;
> > +             rsv->start = start;
> > +             rsv->size = size;
> > +
> > +             mutex_lock(&mgr->lock);
> > +             list_add_tail(&rsv->blocks, &mgr->reservations_pending);
> > +             mutex_unlock(&mgr->lock);
> > +
> > +     }
> >
> >       mutex_lock(&mgr->lock);
> > -     list_add_tail(&rsv->blocks, &mgr->reservations_pending);
> >       amdgpu_vram_mgr_do_reserve(&mgr->manager);
> >       mutex_unlock(&mgr->lock);
> >
> > --
> > 2.34.1
>
>





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux