Am 29.08.23 um 09:29 schrieb Boris Brezillon:
On Tue, 29 Aug 2023 05:34:23 +0300
Dmitry Osipenko <dmitry.osipenko@xxxxxxxxxxxxx> wrote:
On 8/28/23 13:12, Boris Brezillon wrote:
On Sun, 27 Aug 2023 20:54:43 +0300
Dmitry Osipenko <dmitry.osipenko@xxxxxxxxxxxxx> wrote:
In a preparation of adding drm-shmem memory shrinker, move all reservation
locking lockdep checks to use new drm_gem_shmem_resv_assert_held() that
will resolve spurious lockdep warning about wrong locking order vs
fs_reclam code paths during freeing of shmem GEM, where lockdep isn't
aware that it's impossible to have locking contention with the fs_reclam
at this special time.
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@xxxxxxxxxxxxx>
---
drivers/gpu/drm/drm_gem_shmem_helper.c | 37 +++++++++++++++++---------
1 file changed, 25 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
index d96fee3d6166..ca5da976aafa 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -128,6 +128,23 @@ struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, size_t
}
EXPORT_SYMBOL_GPL(drm_gem_shmem_create);
+static void drm_gem_shmem_resv_assert_held(struct drm_gem_shmem_object *shmem)
+{
+ /*
+ * Destroying the object is a special case.. drm_gem_shmem_free()
+ * calls many things that WARN_ON if the obj lock is not held. But
+ * acquiring the obj lock in drm_gem_shmem_free() can cause a locking
+ * order inversion between reservation_ww_class_mutex and fs_reclaim.
+ *
+ * This deadlock is not actually possible, because no one should
+ * be already holding the lock when drm_gem_shmem_free() is called.
+ * Unfortunately lockdep is not aware of this detail. So when the
+ * refcount drops to zero, we pretend it is already locked.
+ */
+ if (kref_read(&shmem->base.refcount))
+ drm_gem_shmem_resv_assert_held(shmem);
+}
+
/**
* drm_gem_shmem_free - Free resources associated with a shmem GEM object
* @shmem: shmem GEM object to free
@@ -142,8 +159,6 @@ void drm_gem_shmem_free(struct drm_gem_shmem_object *shmem)
if (obj->import_attach) {
drm_prime_gem_destroy(obj, shmem->sgt);
} else if (!shmem->imported_sgt) {
- dma_resv_lock(shmem->base.resv, NULL);
-
drm_WARN_ON(obj->dev, kref_read(&shmem->vmap_use_count));
if (shmem->sgt) {
@@ -156,8 +171,6 @@ void drm_gem_shmem_free(struct drm_gem_shmem_object *shmem)
drm_gem_shmem_put_pages_locked(shmem);
AFAICT, drm_gem_shmem_put_pages_locked() is the only function that's
called in the free path and would complain about resv-lock not being
held. I think I'd feel more comfortable if we were adding a
drm_gem_shmem_free_pages() function that did everything
drm_gem_shmem_put_pages_locked() does except for the lock_held() check
and the refcount dec, and have it called here (and in
drm_gem_shmem_put_pages_locked()). This way we can keep using
dma_resv_assert_held() instead of having our own variant.
It's not only drm_gem_shmem_free_pages(), but any drm-shmem function
that drivers may use in the GEM's freeing callback.
For example, panfrost_gem_free_object() may unpin shmem BO and then do
drm_gem_shmem_free().
Is this really a valid use case?
I haven't followed the whole discussion, but I think it isn't a valid
use case.
That page_use_count is none zero while the GEM object is about to be
destroyed can only happen is someone managed to grab a reference to the
page without referencing the GEM object.
This is turn usually happens when somebody incorrectly walks the CPU
page tables and grabs page references where it shouldn't. KMS used to do
this and we had already had a discussion that they shouldn't do this.
Regards,
Christian.
If the GEM refcount dropped to zero,
we should certainly not have pages_pin_count > 0 (thinking of vmap-ed
buffers that might disappear while kernel still has a pointer to the
CPU-mapped area). The only reason we have this
drm_gem_shmem_put_pages_locked() in drm_gem_shmem_free() is because of
this implicit ref hold by the sgt, and IMHO, we should be stricter and
check that pages_use_count == 1 when sgt != NULL and pages_use_count ==
0 otherwise.
I actually think it's a good thing to try and catch any attempt to call
functions trying lock the resv in a path they're not supposed to. At
least we can decide whether these actions are valid or not in this
context, and provide dedicated helpers for the free path if they are.