On 2018年11月12日 18:16, Christian König
wrote:
Am 09.11.18 um 23:26 schrieb Eric
Anholt:
Eric Anholt <eric@xxxxxxxxxx> writes:
[ Unknown signature status ]
zhoucm1 <zhoucm1@xxxxxxx> writes:
On 2018年11月09日 00:52, Christian König wrote:
Am 08.11.18 um 17:07 schrieb Koenig, Christian:
Am 08.11.18 um 17:04 schrieb Eric Anholt:
Daniel suggested I submit this, since we're still seeing regressions
from it. This is a revert to before 48197bc564c7 ("drm: add syncobj
timeline support v9") and its followon fixes.
This is a harmless false positive from lockdep, Chouming and I are
already working on a fix.
On the other hand we had enough trouble with that patch, so if it
really bothers you feel free to add my Acked-by: Christian König
<christian.koenig@xxxxxxx> and push it.
NAK, please no, I don't think this needed, the Warning totally isn't
related to syncobj timeline, but fence-array implementation flaw, just
exposed by syncobj.
In addition, Christian already has a fix for this Warning, I've tested.
Please Christian send to public review.
I backed out my revert of #2 (#1 still necessary) after adding the
lockdep regression fix, and now my CTS run got oomkilled after just a
few hours, with these notable lines in the unreclaimable slab info list:
[ 6314.373099] drm_sched_fence 69095KB 69095KB
[ 6314.373653] kmemleak_object 428249KB 428384KB
[ 6314.373736] kmalloc-262144 256KB 256KB
[ 6314.373743] kmalloc-131072 128KB 128KB
[ 6314.373750] kmalloc-65536 64KB 64KB
[ 6314.373756] kmalloc-32768 1472KB 1728KB
[ 6314.373763] kmalloc-16384 64KB 64KB
[ 6314.373770] kmalloc-8192 208KB 208KB
[ 6314.373778] kmalloc-4096 2408KB 2408KB
[ 6314.373784] kmalloc-2048 288KB 336KB
[ 6314.373792] kmalloc-1024 1457KB 1512KB
[ 6314.373800] kmalloc-512 854KB 1048KB
[ 6314.373808] kmalloc-256 188KB 268KB
[ 6314.373817] kmalloc-192 69141KB 69142KB
[ 6314.373824] kmalloc-64 47703KB 47704KB
[ 6314.373886] kmalloc-128 46396KB 46396KB
[ 6314.373894] kmem_cache 31KB 35KB
No results from kmemleak, though.
OK, it looks like the #2 revert probably isn't related to the OOM issue.
Running a single job on otherwise unused DRM, watching /proc/slabinfo
every second for drm_sched_fence, I get:
drm_sched_fence 0 0 192 21 1 : tunables 32 16 8 : slabdata 0 0 0 : globalstat 0 0 0 0 0 0 0 0 0 : cpustat 0 0 0 0
drm_sched_fence 16 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
drm_sched_fence 13 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
drm_sched_fence 6 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
drm_sched_fence 4 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
drm_sched_fence 2 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
drm_sched_fence 0 21 192 21 1 : tunables 32 16 8 : slabdata 0 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
So we generate a ton of fences, and I guess free them slowly because of
RCU? And presumably kmemleak was sucking up lots of memory because of
how many of these objects were laying around.
Hi Eric,
Thanks for testing, I checked the code, we could forget signal
fence-array immediately after many reviews without callback for
fence-array, everything is waiting fence_wait or syncobj free,
that's why you see "free them slowly".
Maybe we just need one line change as attahced, Could you please
have a try with it on your tests?
btw, I also didn't find there is fence-leak, or you can point me.
Thanks,
David
That is certainly possible. Another possibility is that we don't
drop the reference in dma-fence-array early enough.
E.g. the dma-fence-array will keep the reference to its fences
until it is destroyed, which is a bit late when you chain multiple
dma-fence-array objects together.
David can you take a look at this and propose a fix? That would
probably be good to have fixed in dma-fence-array separately to
the timeline work.
Thanks,
Christian.
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel
|
>From 6f03f63393be5897ef33a1714ea2ce8d18999014 Mon Sep 17 00:00:00 2001
From: Chunming Zhou <david1.zhou@xxxxxxx>
Date: Tue, 13 Nov 2018 14:10:13 +0800
Subject: [PATCH] drm/syncobj: enable signaling for fence_array of signal pt
otherwise fence_arrya cannot be signaled in time.
Signed-off-by: Chunming Zhou <david1.zhou@xxxxxxx>
---
drivers/gpu/drm/drm_syncobj.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index da2b85eec6cf..109abb5df399 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -253,6 +253,7 @@ static int drm_syncobj_create_signal_pt(struct drm_syncobj *syncobj,
goto fail;
}
+ dma_fence_enable_sw_signaling(&signal_pt->fence_array->base);
spin_lock(&syncobj->pt_lock);
if (syncobj->signal_point >= point) {
DRM_WARN("A later signal is ready!");
--
2.17.1
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel