On Wed, 25 Jan 2023 at 14:20, Christian König <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > > Am 25.01.23 um 13:53 schrieb Matthew Auld: > > On Wed, 25 Jan 2023 at 11:35, Christian König > > <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > >> Am 25.01.23 um 11:21 schrieb Matthew Auld: > >>> On Wed, 25 Jan 2023 at 10:07, Christian König > >>> <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > >>>> Am 25.01.23 um 10:56 schrieb Matthew Auld: > >>>>> On Tue, 24 Jan 2023 at 17:15, Matthew Auld > >>>>> <matthew.william.auld@xxxxxxxxx> wrote: > >>>>>> On Tue, 24 Jan 2023 at 13:48, Matthew Auld > >>>>>> <matthew.william.auld@xxxxxxxxx> wrote: > >>>>>>> On Tue, 24 Jan 2023 at 12:57, Christian König > >>>>>>> <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > >>>>>>>> From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> > >>>>>>>> > >>>>>>>> Make sure we can at least move and alloc TT objects without backing store. > >>>>>>>> > >>>>>>>> v2: clear the tt object even when no resource is allocated. > >>>>>>>> v3: add Matthews changes for i915 as well. > >>>>>>>> > >>>>>>>> Signed-off-by: Christian König <christian.koenig@xxxxxxx> > >>>>>>> Reviewed-by: Matthew Auld <matthew.auld@xxxxxxxxx> > >>>>>> Ofc that assumes intel-gfx CI is now happy with the series. > >>>>> There are still some nasty failures it seems (in the extended test > >>>>> list). But it looks like the series is already merged. Can we quickly > >>>>> revert and try again? > >>>> Ah, crap. I thought everything would be fine after the CI gave it's go. > >>>> > >>>> Which patch is causing the fallout? > >>> I'm not sure. I think all of the patches kind of interact with each > >>> other, but for sure there is an issue with the first patch. There is > >>> one splat like: > >> Well I would rather like to revert as less as possible. > >> > >> Are you sure that this isn't only on some i915 specific branch with not > >> yet upstream changes? > > Yeah, that splat is taken directly from the CI results reported with > > this series. So it's just your series applied on top of drm-tip. > > > > Can you take a look at the first patch here: > > https://patchwork.freedesktop.org/series/113332/ > > > > Maybe you have a better idea? For reference the IGTs that we have for > > verifying userspace object clearing are now failing, so hoping that > > fixes it. The other two patches I'm hoping will fix the splat. > > The TTM change looks like a good idea to me. Feel free to add my rb to > this one. > > I can't say much about the i915 changes. > > Maybe we should revert the two TTM patches to not allocate resources for > now and fix i915 first? >From what I can see, we would need to revert all three TTM patches, keeping just the i915 one. Reverting for now I think makes sense. > > Christian. > > > > >> I can't even find the i915_gem_obj_copy_ttm function in drm-misc-next > >> nor drm-next. > >> > >> Regards, > >> Christian. > >> > >>> <1>[ 109.735148] BUG: kernel NULL pointer dereference, address: > >>> 0000000000000010 > >>> <1>[ 109.735151] #PF: supervisor read access in kernel mode > >>> <1>[ 109.735152] #PF: error_code(0x0000) - not-present page > >>> <6>[ 109.735153] PGD 0 P4D 0 > >>> <4>[ 109.735155] Oops: 0000 [#1] PREEMPT SMP NOPTI > >>> <4>[ 109.735157] CPU: 1 PID: 92 Comm: kworker/u12:6 Not tainted > >>> 6.2.0-rc5-Patchwork_113269v1-gc4d436608c4e+ #1 > >>> <4>[ 109.735159] Hardware name: Gigabyte Technology Co., Ltd. GB-Z390 > >>> Garuda/GB-Z390 Garuda-CF, BIOS IG1c 11/19/2019 > >>> <4>[ 109.735160] Workqueue: events_unbound async_run_entry_fn > >>> <4>[ 109.735163] RIP: 0010:i915_ttm_resource_mappable+0x4/0x30 [i915] > >>> <4>[ 109.735286] Code: b8 f9 ff ff ff eb c2 e8 aa 5e 52 e1 e9 4f 0f > >>> 18 00 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 > >>> 66 0f 1f 00 <8b> 57 10 b8 01 00 00 00 85 d2 74 15 48 8b 47 08 48 05 ff > >>> 0f 00 00 > >>> <4>[ 109.735288] RSP: 0018:ffffc90000f339a8 EFLAGS: 00010246 > >>> <4>[ 109.735289] RAX: 0000000000000000 RBX: 0000000000000000 RCX: > >>> ffff88810cea3a00 > >>> <4>[ 109.735290] RDX: 0000000000000000 RSI: ffffc90000f33af0 RDI: > >>> 0000000000000000 > >>> <4>[ 109.735292] RBP: ffff88811645d7c0 R08: 0000000000000000 R09: > >>> ffff888123afa940 > >>> <4>[ 109.735292] R10: 0000000000000001 R11: ffff888104b70040 R12: > >>> 0000000000000000 > >>> <4>[ 109.735293] R13: 0000000000000000 R14: ffffc90000f33b08 R15: > >>> ffffc90000f33af0 > >>> <4>[ 109.735294] FS: 0000000000000000(0000) > >>> GS:ffff8884ad680000(0000) knlGS:0000000000000000 > >>> <4>[ 109.735295] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> <4>[ 109.735296] CR2: 0000000000000010 CR3: 000000011f9c6003 CR4: > >>> 00000000003706e0 > >>> <4>[ 109.735297] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > >>> 0000000000000000 > >>> <4>[ 109.735298] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > >>> 0000000000000400 > >>> <4>[ 109.735299] Call Trace: > >>> <4>[ 109.735300] <TASK> > >>> <4>[ 109.735301] __i915_ttm_move+0x128/0x940 [i915] > >>> <4>[ 109.735408] ? dma_resv_iter_next+0x91/0xb0 > >>> <4>[ 109.735412] ? dma_resv_iter_first+0x42/0xb0 > >>> <4>[ 109.735414] ? i915_deps_add_resv+0x4c/0xc0 [i915] > >>> <4>[ 109.735520] i915_gem_obj_copy_ttm+0x12f/0x250 [i915] > >>> <4>[ 109.735625] i915_ttm_restore+0x167/0x250 [i915] > >>> <4>[ 109.735759] i915_gem_process_region+0x27a/0x3b0 [i915] > >>> <4>[ 109.735881] i915_ttm_restore_region+0x4b/0x70 [i915] > >>> <4>[ 109.735999] lmem_restore+0x3a/0x60 [i915] > >>> <4>[ 109.736101] i915_gem_resume+0x4c/0x100 [i915] > >>> <4>[ 109.736202] i915_drm_resume+0xc2/0x170 [i915] > >>> > >>> Plus some other less obvious issue(s) with some tests failing. > >>> > >>>> Christian. >