Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 08/11/17 15:19, Guillaume Tucker wrote:
On 07/11/17 11:43, Guillaume Tucker wrote:
On 07/11/17 10:55, Mark Brown wrote:
On Tue, Nov 07, 2017 at 10:12:59AM +0000, Jon Hunter wrote:
On 06/11/17 19:17, Mark Brown wrote:

    multi_v7_defconfig:
        tegra124-nyan-big:
            lab-collabora: failing since 2 days (last pass: next-20171102 - first fail: next-20171103)

Thanks for the report. I have been looking into a failure on nyan-big
[0], but this one looks like a new failure. I will take a look.

Guillaume Tucker has been bisecting this with the shiny new bisection
code he's testing, he was saying on IRC he thinks he's found the
offending commit:

https://people.collabora.com/~gtucker/tmp/bisect-tegra-4.14.rc8-next-20171106.txt

(not CCing Johannes yet)

Please take this with a pinch of salt, I'm now running some extra
boot tests to prove it.  If you look at this log, all the boots
passed which is a bit suspicious.  I did build and boot the
revision it found with multi_v7_defconfig on tegra124 and it
passed, so it looks like this commit may not have anything to do
with the boot failure.  The automated bisection is still experimental.

Passing LAVA boot test with this revision:

  https://lava.collabora.co.uk/scheduler/job/976375

I've started a slightly different bisection job now on
next-20171107 and the common ancestor between next and mainline,
results can take a few hours to come back.

After a few more automated bisection attempts and a bug fix in
LAVA, I've now found at least one potentially breaking commit:

   commit d89e2378a97fafdc74cbf997e7c88af75b81610a
   Author: Robin Murphy <robin.murphy@xxxxxxx>
   Date:   Thu Oct 12 16:56:14 2017 +0100

       drivers: flag buses which demand DMA configuration


I've run some boot tests manually with this revision and then
also after reverting it in-place, these respectively failed and
passed:

   * d89e2378, failed:
     https://lava.collabora.co.uk/scheduler/job/978968

   * d89e2378 reverted, passed:
     https://lava.collabora.co.uk/scheduler/job/978969


I then went on and tried the same but on top of next-20171108 and
found that they both failed

   * next-20171108, failed:
     https://lava.collabora.co.uk/scheduler/job/979063

   * next-20171108 with d89e2378 reverted, failed as well:
     https://lava.collabora.co.uk/scheduler/job/979167


So this shows there is almost certainly another offending commit
in -next.  The errors in both cases are not quite the same, the
last one is triggered by a BUG whereas the first one is a NULL
pointer (I haven't looked any further).  Also I don't think
there's any fix for d89e2378a97fafdc74cbf997e7c88af75b81610a
which is currently still in next.

The fix was actually posted before said commit was even written:

https://patchwork.kernel.org/patch/9967847/

What is currently queued in the DMA tree fell out of the discussion on patch 2 of that series, but I kind of assumed the host1x folks would still take patch 1; I guess that hasn't happened.

Robin.


Note: This happens to be a very good example of running a
kernelci.org bisection on a real issue, it's quite a bit of a
pipe cleaner.  I'll now see if there's a way to bisect what looks
like another breaking change in-between.

Guillaume
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [ARM Kernel]     [Linux ARM]     [Linux ARM MSM]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux