Re: CPUfreq fail on rk3399-firefly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Kevin Hilman <khilman@xxxxxxxxxxxx> writes:

> Kever Yang <kever.yang@xxxxxxxxxxxxxx> writes:
>
>> Hi Kevin, Heiko,
>>
>> On 2019/8/22 上午2:59, Kevin Hilman wrote:
>>> Hi Heiko,
>>>
>>> Heiko Stuebner <heiko@xxxxxxxxx> writes:
>>>
>>>> Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
>>>>> [ resent with correct addr for linux-rockchip list ]
>>>>>
>>>>> Mark Brown <broonie@xxxxxxxxxx> writes:
>>>>>
>>>>>> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
>>>>>>
>>>>>> Today's -next started failing to boot defconfig on rk3399-firefly:
>>>>>>
>>>>>>> arm64:
>>>>>>>      defconfig:
>>>>>>>          gcc-8:
>>>>>>>              rk3399-firefly: 1 failed lab
>>>>>> It hits a BUG() trying to set up cpufreq:
>>>>>>
>>>>>> [   87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz
>>>>>> [   87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz
>>>>>> [   87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz
>>>>>> [   87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22
>>>>>> [   87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22
>>>>>> [   87.495335] ------------[ cut here ]------------
>>>>>> [   87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438!
>>>>>> [   87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>>>>>>
>>>>>> I'm struggling to see anything relevant in the diff from yesterday, the
>>>>>> unlisted frequency warnings were there in the logs yesterday but no oops
>>>>>> and I'm not seeing any changes in cpufreq, clk or anything relevant
>>>>>> looking.
>>>>>>
>>>>>> Full bootlog and other info can be found here:
>>>>>>
>>>>>> 	https://kernelci.org/boot/id/5d302d8359b51498d049e983/
>>>>> I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n)
>>>>> makes the firefly board start working again.
>>>>>
>>>>> Note that the default defconfig enables the "performance" CPUfreq
>>>>> governor as the default governor, so during kernel boot, it will always
>>>>> switch to the max frequency.
>>>>>
>>>>> For fun, I set the default governor to "userspace" so the kernel
>>>>> wouldn't make any OPP changes, and that leads to a slightly more
>>>>> informative splat[1]
>>>>>
>>>>> There is still an OPP change happening because the detected OPP is not
>>>>> one that's listed in the table, so it tries to change to a listed OPP
>>>>> and fails in the bowels of clk_set_rate()
>>>> Though I think that might only be a symptom as well.
>>>> Both the PLL setting code as well as the actual cpu-clock implementation
>>>> is unchanged since 2017 (and runs just fine on all boards in my farm).
>>>>
>>>> One source for these issues is often the regulator supplying the cpu
>>>> going haywire - aka the voltage not matching the opp.
>>>>
>>>> As in this error-case it's CPU4 being set, this would mean it might
>>>> be the big cluster supplied by the external syr825 (fan5355 clone)
>>>> that might act up. In the Firefly-rk3399 case this is even stranger.
>>>>
>>>> There is a discrepancy between the "fcs,suspend-voltage-selector"
>>>> between different bootloader versions (how the selection-pin is set up),
>>>> so the kernel might actually write his requested voltage to the wrong
>>>> register (not the one for actual voltage, but the second set used for
>>>> the suspend voltage).
>>>>
>>>> Did you by chance swap bootloaders at some point in recent past?
>>> No, haven't touched bootloader since I initially setup the board.
>>
>> The CPU voltage does not affect by bootloader for kernel should have its 
>> own opp-table,
>>
>> the bootloader may only affect the center/logic power supply.
>>
>>>
>>>> I'd assume [2] might actually be the same issue last year, though
>>>> the CI-logs are not available anymore it seems.
>>>>
>>>> Could you try to set the vdd_cpu_b regulator to disabled, so that
>>>> cpufreq for this cluster defers and see what happens?
>>> Yes, this change[1] definitely makes things boot reliably again, so
>>> there's defintiely something a bit unstable with this regulator, at
>>> least on this firefly.
>>
>> Is it possible to target which patch introduce this bug? This board  
>> should have work correctly for a long time with upstream source code.
>
> Unfortunately, it seems to be a regular, but intermittent failure, so
> bisection is not producing anything reliable.
>
> You can see that both in mainline[1] and in linux-next[2] there are
> periodic failures, but it's hard to see any patterns.

Even worse, I (re)tested mainline for versions that were previously
passing (v5.2, v5.3-rc5) and they are also failing now.

They work again if I disable that regulator as suggested by Heiko.

So this is increasingly pointing to failing hardware.

Kevin




[Index of Archives]     [Linux Kernel]     [Linux USB Development]     [Yosemite News]     [Linux SCSI]

  Powered by Linux