> -----Original Message----- > From: Auld, Matthew <matthew.auld@xxxxxxxxx> > Sent: Wednesday, September 11, 2024 2:05 PM > To: Upadhyay, Tejas <tejas.upadhyay@xxxxxxxxx>; intel- > xe@xxxxxxxxxxxxxxxxxxxxx > Cc: Ghimiray, Himal Prasad <himal.prasad.ghimiray@xxxxxxxxx>; Thomas > Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx>; stable@xxxxxxxxxxxxxxx > Subject: Re: [PATCH 2/4] drm/xe/client: add missing bo locking in > show_meminfo() > > Hi, > > On 11/09/2024 06:39, Upadhyay, Tejas wrote: > > > > > >> -----Original Message----- > >> From: Auld, Matthew <matthew.auld@xxxxxxxxx> > >> Sent: Tuesday, September 10, 2024 6:42 PM > >> To: intel-xe@xxxxxxxxxxxxxxxxxxxxx > >> Cc: Ghimiray, Himal Prasad <himal.prasad.ghimiray@xxxxxxxxx>; > >> Upadhyay, Tejas <tejas.upadhyay@xxxxxxxxx>; Thomas Hellström > >> <thomas.hellstrom@xxxxxxxxxxxxxxx>; stable@xxxxxxxxxxxxxxx > >> Subject: [PATCH 2/4] drm/xe/client: add missing bo locking in > >> show_meminfo() > >> > >> bo_meminfo() wants to inspect bo state like tt and the ttm resource, > >> however this state can change at any point leading to stuff like NPD > >> and UAF, if the bo lock is not held. Grab the bo lock when calling > >> bo_meminfo(), ensuring we drop any spinlocks first. In the case of > >> object_idr we now also need to hold a ref. > >> > >> Fixes: 0845233388f8 ("drm/xe: Implement fdinfo memory stats > >> printing") > >> Signed-off-by: Matthew Auld <matthew.auld@xxxxxxxxx> > >> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@xxxxxxxxx> > >> Cc: Tejas Upadhyay <tejas.upadhyay@xxxxxxxxx> > >> Cc: "Thomas Hellström" <thomas.hellstrom@xxxxxxxxxxxxxxx> > >> Cc: <stable@xxxxxxxxxxxxxxx> # v6.8+ > >> --- > >> drivers/gpu/drm/xe/xe_drm_client.c | 37 > +++++++++++++++++++++++++++--- > >> 1 file changed, 34 insertions(+), 3 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/xe/xe_drm_client.c > >> b/drivers/gpu/drm/xe/xe_drm_client.c > >> index badfa045ead8..3cca741c500c 100644 > >> --- a/drivers/gpu/drm/xe/xe_drm_client.c > >> +++ b/drivers/gpu/drm/xe/xe_drm_client.c > >> @@ -10,6 +10,7 @@ > >> #include <linux/slab.h> > >> #include <linux/types.h> > >> > >> +#include "xe_assert.h" > >> #include "xe_bo.h" > >> #include "xe_bo_types.h" > >> #include "xe_device_types.h" > >> @@ -151,10 +152,13 @@ void xe_drm_client_add_bo(struct > xe_drm_client > >> *client, > >> */ > >> void xe_drm_client_remove_bo(struct xe_bo *bo) { > >> + struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev); > >> struct xe_drm_client *client = bo->client; > >> > >> + xe_assert(xe, !kref_read(&bo->ttm.base.refcount)); > >> + > >> spin_lock(&client->bos_lock); > >> - list_del(&bo->client_link); > >> + list_del_init(&bo->client_link); > >> spin_unlock(&client->bos_lock); > >> > >> xe_drm_client_put(client); > >> @@ -207,7 +211,20 @@ static void show_meminfo(struct drm_printer *p, > >> struct drm_file *file) > >> idr_for_each_entry(&file->object_idr, obj, id) { > >> struct xe_bo *bo = gem_to_xe_bo(obj); > >> > >> - bo_meminfo(bo, stats); > >> + if (dma_resv_trylock(bo->ttm.base.resv)) { > >> + bo_meminfo(bo, stats); > >> + xe_bo_unlock(bo); > >> + } else { > >> + xe_bo_get(bo); > >> + spin_unlock(&file->table_lock); > >> + > >> + xe_bo_lock(bo, false); > >> + bo_meminfo(bo, stats); > >> + xe_bo_unlock(bo); > >> + > >> + xe_bo_put(bo); > >> + spin_lock(&file->table_lock); > >> + } > >> } > >> spin_unlock(&file->table_lock); > >> > >> @@ -217,7 +234,21 @@ static void show_meminfo(struct drm_printer *p, > >> struct drm_file *file) > >> if (!kref_get_unless_zero(&bo->ttm.base.refcount)) > >> continue; > >> > > > > While we have ref to BO, why would it need lock here, can you please > explain if I am missing something. I though BO cant be deleted till will hold > ref? > > The ref is just about protecting the lifetime of the bo, however the internal bo > state in particular the ttm stuff, is all protected by holding the dma-resv bo > lock. > > For example the bo can be moved/evicted around at will and the object state > changes with it, but that should be done only when also holding the bo lock. > If we are holding the bo lock here then the object state should be stable, > making it safe to inspect stuff like bo->ttm.ttm and > bo->ttm.resource. As an example, if you look at ttm_bo_move_null() and > imagine xe_bo_has_pages() racing with that, then NPD or UAF is possible. > Ok, meaning ref will protect deletion of bo while in used, but internal state of bo's fields can still change with holding ref. Got it, Reviewed-by: Tejas Upadhyay <tejas.upadhyay@xxxxxxxxx> Thanks, Tejas > > > > Thanks, > > Tejas > >> - bo_meminfo(bo, stats); > >> + if (dma_resv_trylock(bo->ttm.base.resv)) { > >> + bo_meminfo(bo, stats); > >> + xe_bo_unlock(bo); > >> + } else { > >> + spin_unlock(&client->bos_lock); > >> + > >> + xe_bo_lock(bo, false); > >> + bo_meminfo(bo, stats); > >> + xe_bo_unlock(bo); > >> + > >> + spin_lock(&client->bos_lock); > >> + /* The bo ref will prevent this bo from being removed > >> from the list */ > >> + xe_assert(xef->xe, !list_empty(&bo->client_link)); > >> + } > >> + > >> xe_bo_put_deferred(bo, &deferred); > >> } > >> spin_unlock(&client->bos_lock); > >> -- > >> 2.46.0 > >