Quoting Tvrtko Ursulin (2018-12-03 17:11:59) > > On 03/12/2018 11:36, Chris Wilson wrote: > > We inspect the requests under the assumption that they will be marked as > > completed when they are removed from the queue. Currently however, in the > > process of wedging the requests will be removed from the queue before they > > are completed, so rearrange the code to complete the fences before the > > locks are dropped. > > > > <1>[ 354.473346] BUG: unable to handle kernel NULL pointer dereference at 0000000000000250 > > <6>[ 354.473363] PGD 0 P4D 0 > > <4>[ 354.473370] Oops: 0000 [#1] PREEMPT SMP PTI > > <4>[ 354.473380] CPU: 0 PID: 4470 Comm: gem_eio Tainted: G U 4.20.0-rc4-CI-CI_DRM_5216+ #1 > > <4>[ 354.473393] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018 > > <4>[ 354.473480] RIP: 0010:__i915_schedule+0x311/0x5e0 [i915] > > <4>[ 354.473490] Code: 49 89 44 24 20 4d 89 4c 24 28 4d 89 29 44 39 b3 a0 04 00 00 7d 3a 41 8b 44 24 78 85 c0 74 13 48 8b 93 78 04 00 00 48 83 e2 fc <39> 82 50 02 00 00 79 1e 44 89 b3 a0 04 00 00 48 8d bb d0 03 00 00 > > This confuses me, isn't the code segment usually at the end? *shrug* It was cut and paste. > And then > you have another after the call trace which doesn't match > __i915_scheduel.. anyways, _this_ code seems to be this part: > > if (node_to_request(node)->global_seqno && > 90d: 8b 43 78 mov eax,DWORD PTR [rbx+0x78] > 910: 85 c0 test eax,eax > 912: 74 13 je 927 <__i915_schedule+0x317> > > i915_seqno_passed(port_request(engine->execlists.port)->global_seqno, > 914: 49 8b 97 c0 04 00 00 mov rdx,QWORD PTR [r15+0x4c0] > 91b: 48 83 e2 fc and rdx,0xfffffffffffffffc > if (node_to_request(node)->global_seqno && > 91f: 39 82 50 02 00 00 cmp DWORD PTR [rdx+0x250],eax > 925: 79 1e jns 945 <__i915_schedule+0x335> > > > <4>[ 354.473515] RSP: 0018:ffffc900001bba90 EFLAGS: 00010046 > > <4>[ 354.473524] RAX: 0000000000000003 RBX: ffff8882624c8008 RCX: f34a737800000000 > > <4>[ 354.473535] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8882624c8048 > > <4>[ 354.473545] RBP: ffffc900001bbab0 R08: 000000005963f1f1 R09: 0000000000000000 > > <4>[ 354.473556] R10: ffffc900001bba10 R11: ffff8882624c8060 R12: ffff88824fdd7b98 > > <4>[ 354.473567] R13: ffff88824fdd7bb8 R14: 0000000000000001 R15: ffff88824fdd7750 > > <4>[ 354.473578] FS: 00007f44b4b5b980(0000) GS:ffff888277e00000(0000) knlGS:0000000000000000 > > <4>[ 354.473590] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > <4>[ 354.473599] CR2: 0000000000000250 CR3: 000000026976e000 CR4: 0000000000340ef0 > > Given the registers above, I think it means this - eax is global_seqno > of the node rq. rdx is is port_request so NULL and bang. No request in > port, but why would there always be one at the point we are scheduling > in a new request to the runnable queue? Correct. The answer, as I chose to interpret it, is because of the incomplete submitted+dequeued requests during cancellation which this patch attempts to address. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx