Re: [bug report] drm/msm: devcoredump iommu fault support

Rob Clark <robdclark@xxxxxxxxxxxx> · Mon, 9 May 2022 07:48:23 -0700

On Sun, May 8, 2022 at 11:28 PM Dan Carpenter <dan.carpenter@xxxxxxxxxx> wrote:
>
> Hello Rob Clark,
>
> The patch e25e92e08e32: "drm/msm: devcoredump iommu fault support"
> from Jun 10, 2021, leads to the following Smatch static checker
> warning:
>
> drivers/gpu/drm/msm/msm_gpu.c:418 recover_worker() error: dereferencing freed memory 'gpu'
> drivers/gpu/drm/msm/msm_gpu.c:497 fault_worker() error: dereferencing freed memory 'gpu'
>
> drivers/gpu/drm/msm/msm_gpu.c
>     376 static void recover_worker(struct kthread_work *work)
>     377 {
>     378         struct msm_gpu *gpu = container_of(work, struct msm_gpu, recover_work);
>     379         struct drm_device *dev = gpu->dev;
>     380         struct msm_drm_private *priv = dev->dev_private;
>     381         struct msm_gem_submit *submit;
>     382         struct msm_ringbuffer *cur_ring = gpu->funcs->active_ring(gpu);
>     383         char *comm = NULL, *cmd = NULL;
>     384         int i;
>     385
>     386         mutex_lock(&gpu->lock);
>     387
>     388         DRM_DEV_ERROR(dev->dev, "%s: hangcheck recover!\n", gpu->name);
>     389
>     390         submit = find_submit(cur_ring, cur_ring->memptrs->fence + 1);
>     391         if (submit) {
>     392                 /* Increment the fault counts */
>     393                 submit->queue->faults++;
>     394                 submit->aspace->faults++;
>     395
>     396                 get_comm_cmdline(submit, &comm, &cmd);
>     397
>     398                 if (comm && cmd) {
>     399                         DRM_DEV_ERROR(dev->dev, "%s: offending task: %s (%s)\n",
>     400                                 gpu->name, comm, cmd);
>     401
>     402                         msm_rd_dump_submit(priv->hangrd, submit,
>     403                                 "offending task: %s (%s)", comm, cmd);
>     404                 } else {
>     405                         msm_rd_dump_submit(priv->hangrd, submit, NULL);
>     406                 }
>     407         } else {
>     408                 /*
>     409                  * We couldn't attribute this fault to any particular context,
>     410                  * so increment the global fault count instead.
>     411                  */
>     412                 gpu->global_faults++;
>     413         }
>     414
>     415         /* Record the crash state */
>     416         pm_runtime_get_sync(&gpu->pdev->dev);
>     417         msm_gpu_crashstate_capture(gpu, submit, comm, cmd);
>                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> This function calls:
>
>         dev_coredumpm(gpu->dev->dev, THIS_MODULE, gpu, 0, GFP_KERNEL,
>                                                   ^^^
> Which kfrees gpu.

How does the gpu object get kfree'd?  That is the root problem, it
shouldn't be freed until module unload.  I don't think e25e92e08e32:
"drm/msm: devcoredump iommu fault support" is actually related.

Is there a way to reproduce this?

BR,
-R

> --> 418         pm_runtime_put_sync(&gpu->pdev->dev);
>                                      ^^^^^
> The gpu wasn't supposed to be free so a lot of things go wrong from
> this point.
>
>     419
>     420         kfree(cmd);
>     421         kfree(comm);
>     422
>     423         /*
>     424          * Update all the rings with the latest and greatest fence.. this
>     425          * needs to happen after msm_rd_dump_submit() to ensure that the
>     426          * bo's referenced by the offending submit are still around.
>     427          */
>
> regards,
> dan carpenter