Re: [Intel-gfx] [PATCH 1/5] drm/i915: audit bo->resource usage v3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 25.01.23 um 13:53 schrieb Matthew Auld:
On Wed, 25 Jan 2023 at 11:35, Christian König
<ckoenig.leichtzumerken@xxxxxxxxx> wrote:
Am 25.01.23 um 11:21 schrieb Matthew Auld:
On Wed, 25 Jan 2023 at 10:07, Christian König
<ckoenig.leichtzumerken@xxxxxxxxx> wrote:
Am 25.01.23 um 10:56 schrieb Matthew Auld:
On Tue, 24 Jan 2023 at 17:15, Matthew Auld
<matthew.william.auld@xxxxxxxxx> wrote:
On Tue, 24 Jan 2023 at 13:48, Matthew Auld
<matthew.william.auld@xxxxxxxxx> wrote:
On Tue, 24 Jan 2023 at 12:57, Christian König
<ckoenig.leichtzumerken@xxxxxxxxx> wrote:
From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>

Make sure we can at least move and alloc TT objects without backing store.

v2: clear the tt object even when no resource is allocated.
v3: add Matthews changes for i915 as well.

Signed-off-by: Christian König <christian.koenig@xxxxxxx>
Reviewed-by: Matthew Auld <matthew.auld@xxxxxxxxx>
Ofc that assumes intel-gfx CI is now happy with the series.
There are still some nasty failures it seems (in the extended test
list). But it looks like the series is already merged. Can we quickly
revert and try again?
Ah, crap. I thought everything would be fine after the CI gave it's go.

Which patch is causing the fallout?
I'm not sure. I think all of the patches kind of interact with each
other, but for sure there is an issue with the first patch. There is
one splat like:
Well I would rather like to revert as less as possible.

Are you sure that this isn't only on some i915 specific branch with not
yet upstream changes?
Yeah, that splat is taken directly from the CI results reported with
this series. So it's just your series applied on top of drm-tip.

Can you take a look at the first patch here:
https://patchwork.freedesktop.org/series/113332/

Maybe you have a better idea? For reference the IGTs that we have for
verifying userspace object clearing are now failing, so hoping that
fixes it. The other two patches I'm hoping will fix the splat.

The TTM change looks like a good idea to me. Feel free to add my rb to this one.

I can't say much about the i915 changes.

Maybe we should revert the two TTM patches to not allocate resources for now and fix i915 first?

Christian.


I can't even find the i915_gem_obj_copy_ttm function in drm-misc-next
nor drm-next.

Regards,
Christian.

<1>[  109.735148] BUG: kernel NULL pointer dereference, address:
0000000000000010
<1>[  109.735151] #PF: supervisor read access in kernel mode
<1>[  109.735152] #PF: error_code(0x0000) - not-present page
<6>[  109.735153] PGD 0 P4D 0
<4>[  109.735155] Oops: 0000 [#1] PREEMPT SMP NOPTI
<4>[  109.735157] CPU: 1 PID: 92 Comm: kworker/u12:6 Not tainted
6.2.0-rc5-Patchwork_113269v1-gc4d436608c4e+ #1
<4>[  109.735159] Hardware name: Gigabyte Technology Co., Ltd. GB-Z390
Garuda/GB-Z390 Garuda-CF, BIOS IG1c 11/19/2019
<4>[  109.735160] Workqueue: events_unbound async_run_entry_fn
<4>[  109.735163] RIP: 0010:i915_ttm_resource_mappable+0x4/0x30 [i915]
<4>[  109.735286] Code: b8 f9 ff ff ff eb c2 e8 aa 5e 52 e1 e9 4f 0f
18 00 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
66 0f 1f 00 <8b> 57 10 b8 01 00 00 00 85 d2 74 15 48 8b 47 08 48 05 ff
0f 00 00
<4>[  109.735288] RSP: 0018:ffffc90000f339a8 EFLAGS: 00010246
<4>[  109.735289] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
ffff88810cea3a00
<4>[  109.735290] RDX: 0000000000000000 RSI: ffffc90000f33af0 RDI:
0000000000000000
<4>[  109.735292] RBP: ffff88811645d7c0 R08: 0000000000000000 R09:
ffff888123afa940
<4>[  109.735292] R10: 0000000000000001 R11: ffff888104b70040 R12:
0000000000000000
<4>[  109.735293] R13: 0000000000000000 R14: ffffc90000f33b08 R15:
ffffc90000f33af0
<4>[  109.735294] FS:  0000000000000000(0000)
GS:ffff8884ad680000(0000) knlGS:0000000000000000
<4>[  109.735295] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  109.735296] CR2: 0000000000000010 CR3: 000000011f9c6003 CR4:
00000000003706e0
<4>[  109.735297] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
<4>[  109.735298] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
<4>[  109.735299] Call Trace:
<4>[  109.735300]  <TASK>
<4>[  109.735301]  __i915_ttm_move+0x128/0x940 [i915]
<4>[  109.735408]  ? dma_resv_iter_next+0x91/0xb0
<4>[  109.735412]  ? dma_resv_iter_first+0x42/0xb0
<4>[  109.735414]  ? i915_deps_add_resv+0x4c/0xc0 [i915]
<4>[  109.735520]  i915_gem_obj_copy_ttm+0x12f/0x250 [i915]
<4>[  109.735625]  i915_ttm_restore+0x167/0x250 [i915]
<4>[  109.735759]  i915_gem_process_region+0x27a/0x3b0 [i915]
<4>[  109.735881]  i915_ttm_restore_region+0x4b/0x70 [i915]
<4>[  109.735999]  lmem_restore+0x3a/0x60 [i915]
<4>[  109.736101]  i915_gem_resume+0x4c/0x100 [i915]
<4>[  109.736202]  i915_drm_resume+0xc2/0x170 [i915]

Plus some other less obvious issue(s) with some tests failing.

Christian.




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux