Re: [PATCH] drm/etnaviv: lock MMU while dumping core

Russell King - ARM Linux admin <linux@xxxxxxxxxxxxxxx> · Fri, 24 May 2019 16:31:57 +0100

On Wed, May 22, 2019 at 11:55:14AM +0200, Lucas Stach wrote:
> The devcoredump needs to operate on a stable state of the MMU while
> it is writing the MMU state to the coredump. The missing lock
> allowed both the userspace submit, as well as the GPU job finish
> paths to mutate the MMU state while a coredump is under way.

This locking does little to solve this problem.  We actually rely on the
GPU being deadlocked at this point - we aren't expecting the GPU to make
any progress.  The only thing that can realistically happen is for
userspace to submit a new job, but adding this locking does little to
avoid that.

> Fixes: a8c21a5451d8 (drm/etnaviv: add initial etnaviv DRM driver)
> Reported-by: David Jander <david@xxxxxxxxxxx>
> Signed-off-by: Lucas Stach <l.stach@xxxxxxxxxxxxxx>
> Tested-by: David Jander <david@xxxxxxxxxxx>
> ---
>  drivers/gpu/drm/etnaviv/etnaviv_dump.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
> index 33854c94cb85..515515ef24f9 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
> @@ -125,6 +125,8 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>  		return;
>  	etnaviv_dump_core = false;
>  
> +	mutex_lock(&gpu->mmu->lock);
> +
>  	mmu_size = etnaviv_iommu_dump_size(gpu->mmu);
>  
>  	/* We always dump registers, mmu, ring and end marker */
> @@ -167,6 +169,7 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>  	iter.start = __vmalloc(file_size, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY,
>  			       PAGE_KERNEL);
>  	if (!iter.start) {
> +		mutex_unlock(&gpu->mmu->lock);
>  		dev_warn(gpu->dev, "failed to allocate devcoredump file\n");
>  		return;
>  	}
> @@ -234,6 +237,8 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>  					 obj->base.size);
>  	}
>  
> +	mutex_unlock(&gpu->mmu->lock);
> +

All that this lock does is prevent the lists from changing while we
build up what we're going to write out.  At this point, you drop the
lock, before we've written anything out to the coredump.  Things
can now change, including the ring buffer.

>  	etnaviv_core_dump_header(&iter, ETDUMP_BUF_END, iter.data);
>  
>  	dev_coredumpv(gpu->dev, iter.start, iter.data - iter.start, GFP_KERNEL);

Here we write out the data, which includes the command buffers, ring
buffers, BOs, etc.  The data in the ring buffer can still change
because the lock has been dropped.

However, all those objects should be pinned, so none of them should
go away: things like the command buffers that have been submitted
should be immutable at this point (if they aren't, it could well be
a reason why the GPU has gone awol.)

It is also not nice to hold the lock over the _big_ vmalloc() which
may take some time.

Have you actually seen problems here, or is this just theoretical?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel