On Thu, Jul 20, 2023 at 11:32:28AM +0100, Will Deacon wrote: > On Tue, Jul 11, 2023 at 06:57:33PM +0100, Mark Brown wrote: > > Still investigating but I'm pretty convinced this is nothing to do with > > your commit/series and is just common or garden memory corruption that > > just happens to get tickled by your changes. Sorry for the noise. > Did you get to the bottom of this? If not, do you have a reliable way to > reproduce the problem? I don't like the sound of memory corruption :( Not to the bottom of it, but getting there - I isolated the issue to something in the unregistration path for thermal zones but didn't manage to figure out exactly what. There was some indication it might be a use after free but I'm not convinced. I have a reliable way to reproduce this if you have a pine64plus, it also shows up a lot on the Libretech Tritium but not quite so reliably as pine64plus since Hugh's changes. Equally pine64plus was rock solid until those so there's some timing/environment thing going on which makes the issue manifest obviously, I expect you should be able to trigger the issue by unregistering a thermal driver but the effects might not be visible. There is a change on the list to make the Allwinner SoCs not trigger the issue during boot (their thermal driver refuses to register if any one zone fails but most of their SoCs have multiple thermal zones with only one fully described) but it needs fixing either way.
Attachment:
signature.asc
Description: PGP signature