On Thu, Aug 8, 2019 at 9:09 AM Koenig, Christian <Christian.Koenig@xxxxxxx> wrote: > > Am 07.08.19 um 23:19 schrieb Daniel Vetter: > > On Wed, Jul 31, 2019 at 10:55:02AM +0200, Daniel Vetter wrote: > >> On Thu, Jun 27, 2019 at 09:28:11AM +0200, Christian König wrote: > >>> Hi Daniel, > >>> > >>> those fails look like something random to me and not related to my patch > >>> set. Correct? > >> First one I looked at has the reservation_obj all over: > >> > >> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13438/fi-cml-u/igt@gem_exec_fence@xxxxxxxxxxxxxxxxxxxxxxx > >> > >> So 5 second guees is ... probably real? > >> > >> Note that with the entire lmem stuff going on right now there's massive > >> discussions about how we're doing resv_obj vs obj->mm.lock the wrong way > >> round in i915, so I'm not surprised at all that you managed to trip this. > >> > >> The way I see it right now is that obj->mm.lock needs to be limited to > >> dealing with the i915 shrinker interactions only, and only for i915 native > >> objects. And for dma-bufs we need to make sure it's not anywhere in the > >> callchain. Unfortunately that's a massive refactor I guess ... > > Thought about this some more, aside from just breaking i915 or waiting > > until it's refactored (Both not awesome) I think the only option is get > > back to the original caching. And figure out whether we really need to > > take the direction into account for that, or whether upgrading to > > bidirectional unconditionally won't be ok. I think there's only really two > > cases where this matters: > > > > - display drivers using the cma/dma_alloc helpers. Everything is allocated > > fully coherent, cpu side wc, no flushing. > > > > - Everyone else (on platforms where there's actually some flushing going > > on) is for rendering gpus, and those always map bidirectional and want > > the mapping cached for as long as possible. > > > > With that we could go back to creating the cached mapping at attach time > > and avoid inflicting the reservation object lock to places that would keel > > over. > > > > Thoughts? > > Actually we had a not so nice internal mail thread with our hardware > guys and it looks like we have tons of hardware bugs/exceptions that > sometimes PCIe BARs are only readable or only writable. So it turned out > that always caching with bidirectional won't work for us either. > > Additional to that I'm not sure how i915 actually triggered the issue, > cause with the current code that shouldn't be possible. Forgot to explain this: i915 has it's own lock for managing buffer state, originally struct_mutex, now also i915_gem_obj->mm.lock. When importing we take both of these before calling into the exporter, when exporting we need these when getting called from the import. If the importer uses the reservation_object lock you get a classic ABBA deadlock. I thought the plan was to push struct_mutex down and obj->mm.lock up until they meet in the middle and we can start using the resv_obj ww_mutex for everything. But looking at some of the latest in-flight patches (I cc'ed you on them) that seems to not really be the plan, which is bad :-/ -Daniel > But independent of that I came to the conclusion that we first need to > get to a common view of what the fences in the reservation mean or > otherwise the whole stuff here isn't going to work smooth either. > > So working on that for now and when that's finished I will come back to > this problem here again. > > Regards, > Christian. > > > > -Daniel > > > -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx