Am 20.02.23 um 11:23 schrieb Tvrtko Ursulin:
On 20/02/2023 10:01, Christian König wrote:Am 20.02.23 um 10:55 schrieb Tvrtko Ursulin:Hi, On 14/02/2023 13:59, Christian König wrote:Am 14.02.23 um 13:50 schrieb Tvrtko Ursulin:From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> Currently drm_gem_handle_create_tail exposes the handle to userspace before the buffer object constructions is complete. This allowingof working against a partially constructed object, which may also be inthe process of having its creation fail, can have a range of negative outcomes.A lot of those will depend on what the individual drivers are doing in their obj->funcs->open() callbacks, and also with a common failure modebeing -ENOMEM from drm_vma_node_allow. We can make sure none of this can happen by allocating a handle last, although with a downside that more of the function now runs under the dev->object_name_lock. Looking into the individual drivers open() hooks, we haveamdgpu_gem_object_open which seems like it could have a potential securityissue without this change. A couple drivers like qxl_gem_object_open and vmw_gem_object_open implement no-op hooks so no impact for them.A bunch of other require a deeper look by individual owners to asses forimpact. Those are lima_gem_object_open, nouveau_gem_object_open,panfrost_gem_open, radeon_gem_object_open and virtio_gpu_gem_object_open.Putting aside the risk assesment of the above, some common scenarios tothink about are along these lines: 1)Userspace closes a handle by speculatively "guessing" it from a secondthread. This results in an unreachable buffer object so, a memory leak. 2) Same as 1), but object is in the process of getting closed (failed creation).The second thread is then able to re-cycle the handle and idr_remove would in the first thread would then remove the handle it does not own from theidr. 3)Going back to the earlier per driver problem space - individual impact assesment of allowing a second thread to access and operate on a partiallyconstructed handle / object. (Can something crash? Leak information?)In terms of identifying when the problem started I will tag some patchesas references, but not all, if even any, of them actually point to abroken state. I am just identifying points at which more opportunity forissues to arise was added.Yes I've looked into this once as well, but couldn't completely solve it for some reason.Give me a day or two to get this tested and all the logic swapped back into my head again.Managed to recollect what the problem with earlier attempts was?Nope, that's way to long ago. I can only assume that I ran into problems with the object_name_lock.Probably best to double check if that doesn't result in a lock inversion when somebody grabs the reservation lock in their ->load() callback.Hmm I don't immediately follow the connection. But I have only found radeon_driver_load_kms as using the load callback. Is there any lockdep enabled CI for that driver which could tell us if there is a problem there?
We don't have CI for radeon and most likely never will, that hw is just to old. But we also have amdgpu_gem_object_open() and that looks suspiciously like trouble.
The function makes sure that every BO is registered in the VM house keeping functions of the drm_file and while doing so grabs a few locks. I'm not sure what the locking order of those are.
Could be that this will work, could be that it breaks. I will ping internally today if somebody from my team can take a look at this patch.
Regards, Christian.
Regards, TvrtkoRegards, Christian.Regards, TvrtkoChristian.References: 304eda32920b ("drm/gem: add hooks to notify driver when object handle is created/destroyed")References: ca481c9b2a3a ("drm/gem: implement vma access management") References: b39b5394fabc ("drm/gem: Add drm_gem_object_funcs") Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx Cc: Rob Clark <robdclark@xxxxxxxxxxxx> Cc: Ben Skeggs <bskeggs@xxxxxxxxxx> Cc: David Herrmann <dh.herrmann@xxxxxxxxx> Cc: Noralf Trønnes <noralf@xxxxxxxxxxx> Cc: David Airlie <airlied@xxxxxxxxx> Cc: Daniel Vetter <daniel@xxxxxxxx> Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: lima@xxxxxxxxxxxxxxxxxxxxx Cc: nouveau@xxxxxxxxxxxxxxxxxxxxx Cc: Steven Price <steven.price@xxxxxxx> Cc: virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx Cc: spice-devel@xxxxxxxxxxxxxxxxxxxxx Cc: Zack Rusin <zackr@xxxxxxxxxx> ---drivers/gpu/drm/drm_gem.c | 48 +++++++++++++++++++--------------------1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index aa15c52ae182..e3d897bca0f2 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c@@ -356,52 +356,52 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,u32 *handlep) { struct drm_device *dev = obj->dev; - u32 handle; int ret; WARN_ON(!mutex_is_locked(&dev->object_name_lock)); if (obj->handle_count++ == 0) drm_gem_object_get(obj); + ret = drm_vma_node_allow(&obj->vma_node, file_priv); + if (ret) + goto err_put; + + if (obj->funcs->open) { + ret = obj->funcs->open(obj, file_priv); + if (ret) + goto err_revoke; + } + /* - * Get the user-visible handle using idr. Preload and perform - * allocation under our spinlock. + * Get the user-visible handle using idr as the _last_ step. + * Preload and perform allocation under our spinlock. */ idr_preload(GFP_KERNEL); spin_lock(&file_priv->table_lock); - ret = idr_alloc(&file_priv->object_idr, obj, 1, 0, GFP_NOWAIT); - spin_unlock(&file_priv->table_lock); idr_preload_end(); - mutex_unlock(&dev->object_name_lock); if (ret < 0) - goto err_unref; - - handle = ret; + goto err_close; - ret = drm_vma_node_allow(&obj->vma_node, file_priv); - if (ret) - goto err_remove; + mutex_unlock(&dev->object_name_lock); - if (obj->funcs->open) { - ret = obj->funcs->open(obj, file_priv); - if (ret) - goto err_revoke; - } + *handlep = ret; - *handlep = handle; return 0; +err_close: + if (obj->funcs->close) + obj->funcs->close(obj, file_priv); err_revoke: drm_vma_node_revoke(&obj->vma_node, file_priv); -err_remove: - spin_lock(&file_priv->table_lock); - idr_remove(&file_priv->object_idr, handle); - spin_unlock(&file_priv->table_lock); -err_unref: - drm_gem_object_handle_put_unlocked(obj); +err_put: + if (--obj->handle_count == 0) + drm_gem_object_put(obj); + + mutex_unlock(&dev->object_name_lock); + return ret; }