czw., 4 lut 2021 o 13:11 Marcin Ślusarz <marcin.slusarz@xxxxxxxxx> napisał(a): > > pon., 1 lut 2021 o 13:16 Marcin Ślusarz <marcin.slusarz@xxxxxxxxx> napisał(a): > > > > pon., 1 lut 2021 o 12:43 Rafael J. Wysocki <rafael@xxxxxxxxxx> napisał(a): > > > > > > On Fri, Jan 29, 2021 at 9:03 PM Marcin Ślusarz <marcin.slusarz@xxxxxxxxx> wrote: > > > > > > > > pt., 29 sty 2021 o 19:59 Marcin Ślusarz <marcin.slusarz@xxxxxxxxx> napisał(a): > > > > > > > > > > czw., 28 sty 2021 o 15:32 Marcin Ślusarz <marcin.slusarz@xxxxxxxxx> napisał(a): > > > > > > > > > > > > czw., 28 sty 2021 o 13:39 Rafael J. Wysocki <rafael@xxxxxxxxxx> napisał(a): > > > > > > > The only explanation for that I can think about (and which does not > > > > > > > involve supernatural intervention so to speak) is a stack corruption > > > > > > > occurring between these two calls in sdw_intel_acpi_cb(). IOW, > > > > > > > something scribbles on the handle in the meantime, but ATM I have no > > > > > > > idea what that can be. > > > > > > > > > > > > I tried KASAN but it didn't find anything and kernel actually booted > > > > > > successfully. > > > > > > > > > > I investigated this and it looks like a compiler bug (or something nastier), > > > > > but I can't find where exactly registers get corrupted because if I add printks > > > > > the corruption seems on the printk side, but if I don't add them it seems > > > > > the value gets corrupted earlier. > > > > (...) > > > > > I'm using gcc 10.2.1 from Debian testing. > > > > > > > > Someone on IRC, after hearing only that "gcc miscompiles the kernel", > > > > suggested disabling CONFIG_STACKPROTECTOR_STRONG. > > > > It helped indeed and it matches my observations, so it's quite likely it > > > > is the culprit. > > > > > > > > What do we do now? > > > > > > Figure out why the stack protection kicks in, I suppose. > > > > > > The target object is not on the stack, so if the pointer to it is > > > valid (we need to verify somehow that it is indeed), dereferencing it > > > shouldn't cause the stack protection to trigger. > > > > Well, the problem is not that stack protector finds something, but > > the feature itself corrupts some registers. > > I retract this statement. > > Originally I based it on this piece of code: > 0xffffffff815781f0 <+35>: mov %r12,%rdx > 0xffffffff815781f3 <+38>: mov $0xffffffff81eca4c0,%rsi > 0xffffffff815781fa <+45>: mov $0xffffffff82146d46,%rdi > 0xffffffff81578201 <+52>: call 0xffffffff818909f1 <printk> > 0xffffffff81578206 <+57>: cmpb $0xf,0x8(%r12) > where crash is on the last line and I supposedly could see the message > printed by printk with the correct value of %r12. > However, after attaching kgdb+kgdboe (it's so much pain...) to the kernel > I discovered that someting corrupts memory so much that the formatting > string becomes "", which means that I don't actually see the output of printk. Oh crap, I can't reproduce it anymore. I might have tried this before I disabled KALSR, which would explain why I've seen "" as a formatting string. (because 0xffffffff82146d46 would not be the real address of it)