On 18/02/2021 10:35, Nicolin Chen wrote: > Hi Guillaume, > > Thank you for the test results! And sorry for my belated reply. No worries :) > On Thu, Feb 11, 2021 at 03:50:05PM +0000, Guillaume Tucker wrote: >>> On Sat, Feb 06, 2021 at 01:40:13PM +0000, Guillaume Tucker wrote: >>>>> It'd be nicer if I can get both logs of the vanilla kernel (failing) >>>>> and the commit-reverted version (passing), each applying this patch. >>>> >>>> Sure, I've run 3 jobs: >>>> >>>> * v5.11-rc6 as a reference, to see the original issue: >>>> https://lava.collabora.co.uk/scheduler/job/3187848 >>>> >>>> * + your debug patch: >>>> https://lava.collabora.co.uk/scheduler/job/3187849 >>>> >>>> * + the "breaking" commit reverted, passing the tests: >>>> https://lava.collabora.co.uk/scheduler/job/3187851 >>> >>> Thanks for the help! >>> >>> I am able to figure out what's probably wrong, yet not so sure >>> about the best solution at this point. >>> >>> Would it be possible for you to run one more time with another >>> debugging patch? I'd like to see the same logs as previous: >>> 1. Vanilla kernel + debug patch >>> 2. Vanilla kernel + Reverted + debug patch >> >> As it turns out, next-20210210 is passing all the tests again so >> it looks like this got fixed in the meantime: >> >> https://lava.collabora.co.uk/scheduler/job/3210192 > > I checked this passing log, however, found that the regression is > still there though test passed, as the prints below aren't normal: > tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40: > EMEM address decode error (SMMU translation error [--S]) > tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40: > Page fault (SMMU translation error [--S]) Ah yes sorry, there are other KernelCI checks for kernel errors but that wasn't enabled in the bisection so I didn't notice them. > I was trying to think of a simpler solution than a revert. However, > given the fact that the callback sequence could change -- guessing > likely a recent change in iommu core, I feel it safer to revert my > previous change, not necessarily being a complete revert though. > > I attached my partial reverting change in this email. Would it be > possible for you to run one more test for me to confirm it? It'd > keep the tests passing while eliminating all error prints above. > > If the fix works, I'll re-send it to mail list by adding a commit > message. Sure, here's next-20210218 as a reference: https://lava.collabora.co.uk/scheduler/job/3241236 and here with your patch applied on top of it: https://lava.collabora.co.uk/scheduler/job/3241246 The git branch I've used where your patch is applied: https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210218-nyan-big-drm-read/ The errors seem to have disappeared but I'll let you double check that things are all back to a working state. BTW: This thread is a good example of how having an "on-demand" KernelCI service to let developers re-run tests with extra patches would allow them to fix issues independently. We'll keep that in mind for the future. Best wishes, Guillaume