Re: [PATCH 2/2] drm: Revert syncobj timeline changes.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2018年11月12日 18:48, Chris Wilson wrote:
Quoting Christian König (2018-11-12 10:16:01)
Am 09.11.18 um 23:26 schrieb Eric Anholt:

     Eric Anholt <eric@xxxxxxxxxx> writes:


         [ Unknown signature status ]
         zhoucm1 <zhoucm1@xxxxxxx> writes:


             On 2018年11月09日 00:52, Christian König wrote:

                 Am 08.11.18 um 17:07 schrieb Koenig, Christian:

                     Am 08.11.18 um 17:04 schrieb Eric Anholt:

                         Daniel suggested I submit this, since we're still seeing regressions
                         from it.  This is a revert to before 48197bc564c7 ("drm: add syncobj
                         timeline support v9") and its followon fixes.

                     This is a harmless false positive from lockdep, Chouming and I are
                     already working on a fix.

                 On the other hand we had enough trouble with that patch, so if it
                 really bothers you feel free to add my Acked-by: Christian König
                 <christian.koenig@xxxxxxx> and push it.

             NAK, please no, I don't think this needed, the Warning totally isn't
             related to syncobj timeline, but fence-array implementation flaw, just
             exposed by syncobj.
             In addition, Christian already has a fix for this Warning, I've tested.
             Please Christian send to public review.

         I backed out my revert of #2 (#1 still necessary) after adding the
         lockdep regression fix, and now my CTS run got oomkilled after just a
         few hours, with these notable lines in the unreclaimable slab info list:

         [ 6314.373099] drm_sched_fence        69095KB      69095KB
         [ 6314.373653] kmemleak_object       428249KB     428384KB
         [ 6314.373736] kmalloc-262144           256KB        256KB
         [ 6314.373743] kmalloc-131072           128KB        128KB
         [ 6314.373750] kmalloc-65536             64KB         64KB
         [ 6314.373756] kmalloc-32768           1472KB       1728KB
         [ 6314.373763] kmalloc-16384             64KB         64KB
         [ 6314.373770] kmalloc-8192             208KB        208KB
         [ 6314.373778] kmalloc-4096            2408KB       2408KB
         [ 6314.373784] kmalloc-2048             288KB        336KB
         [ 6314.373792] kmalloc-1024            1457KB       1512KB
         [ 6314.373800] kmalloc-512              854KB       1048KB
         [ 6314.373808] kmalloc-256              188KB        268KB
         [ 6314.373817] kmalloc-192            69141KB      69142KB
         [ 6314.373824] kmalloc-64             47703KB      47704KB
         [ 6314.373886] kmalloc-128            46396KB      46396KB
         [ 6314.373894] kmem_cache                31KB         35KB

         No results from kmemleak, though.

     OK, it looks like the #2 revert probably isn't related to the OOM issue.
     Running a single job on otherwise unused DRM, watching /proc/slabinfo
     every second for drm_sched_fence, I get:

     drm_sched_fence        0      0    192   21    1 : tunables   32   16    8 : slabdata      0      0      0 : globalstat       0      0     0    0    0    0    0    0    0 : cpustat      0      0      0      0
     drm_sched_fence       16     21    192   21    1 : tunables   32   16    8 : slabdata      1      1      0 : globalstat      16     16     1    0    0    0    0    0    0 : cpustat      5      1      6      0
     drm_sched_fence       13     21    192   21    1 : tunables   32   16    8 : slabdata      1      1      0 : globalstat      16     16     1    0    0    0    0    0    0 : cpustat      5      1      6      0
     drm_sched_fence        6     21    192   21    1 : tunables   32   16    8 : slabdata      1      1      0 : globalstat      16     16     1    0    0    0    0    0    0 : cpustat      5      1      6      0
     drm_sched_fence        4     21    192   21    1 : tunables   32   16    8 : slabdata      1      1      0 : globalstat      16     16     1    0    0    0    0    0    0 : cpustat      5      1      6      0
     drm_sched_fence        2     21    192   21    1 : tunables   32   16    8 : slabdata      1      1      0 : globalstat      16     16     1    0    0    0    0    0    0 : cpustat      5      1      6      0
     drm_sched_fence        0     21    192   21    1 : tunables   32   16    8 : slabdata      0      1      0 : globalstat      16     16     1    0    0    0    0    0    0 : cpustat      5      1      6      0

     So we generate a ton of fences, and I guess free them slowly because of
     RCU?  And presumably kmemleak was sucking up lots of memory because of
     how many of these objects were laying around.


That is certainly possible. Another possibility is that we don't drop the
reference in dma-fence-array early enough.

E.g. the dma-fence-array will keep the reference to its fences until it is
destroyed, which is a bit late when you chain multiple dma-fence-array objects
together.

David can you take a look at this and propose a fix? That would probably be
good to have fixed in dma-fence-array separately to the timeline work.
Note that drm_syncobj_replace_fence() leaks any existing fence for
!timeline syncobjs.
Hi Chris,

Hui! Isn't existing fence collected as garbage?

Could you point where/how leaks existing fence?

Thanks,
David
Which coupled with the linear search ends up with
a severe regression in both time and memory.
-Chris

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux