On 05/10/2012 02:26 PM, Kay Sievers wrote: > On Thu, May 10, 2012 at 10:20 PM, Stephen Warren <swarren@xxxxxxxxxxxxx> wrote: >> On 05/10/2012 02:15 PM, Kay Sievers wrote: >>> On Thu, May 10, 2012 at 10:13 PM, Stephen Warren <swarren@xxxxxxxxxxxxx> wrote: >>>> On 05/10/2012 02:09 PM, Stephen Warren wrote: >>>>> On 05/10/2012 02:06 PM, Kay Sievers wrote: >>>>>> On Thu, May 10, 2012 at 9:54 PM, Stephen Warren <swarren@xxxxxxxxxxxxx> wrote: >>>>>>> On 05/09/2012 12:31 PM, Stephen Warren wrote: >>>>>>>> For me, next-20120508 prints nothing when booted, and I think also >>>>>>>> hangs. To solve this, I reverted: >>>>>>>> >>>>>>>> 7ff9554 printk: convert byte-buffer to variable-length record buffer >>>>>>>> >>>>>>>> In order to build, I also had to revert: >>>>>>>> >>>>>>>> c4e00da driver-core: extend dev_printk() to pass structured data >>>>>>>> >>>>>>>> Note: I'm running on an ARM system using a serial console, with >>>>>>>> earlyprintk enabled. >>>>>>> >>>>>>> This issue still occurs in next-20120510. >>>>>>> >>>>>>> I've tracked it down to the assignment of msg->ts_nsec near the end of >>>>>>> log_store(). If I comment this out, everything works. The problem is the >>>>>>> assignment, not the call to local_clock(): >>>>>>> >>>>>>> fails: >>>>>>> msg->ts_nsec = local_clock(); >>>>>>> fails: >>>>>>> msg->ts_nsec = 0;//local_clock(); >>>>>>> works: >>>>>>> //msg->ts_nsec = local_clock(); >>>>>> >>>>>> Weird. >>>>>> >>>>>> What happens if you change it to: >>>>>> cpu_clock(logbuf_cpu); >>>>>> ? >>>>>> >>>>>> If it works, the timestamps look ok? >>>>> >>>>> I doubt that would work - after all, assigning 0 fails, but not >>>>> performing the assignment at all works. But, I'll go try it... >>>> >>>> Calling cpu_clock() instead of local_clock() fails in the same way. >>> >>> Ok, didn't really see the assign to 0 you tried, sorry. :) >>> >>> And 'dmesg' works when you run the box with the line commented out? >>> >>> And 'cat /dev/kmsg'? >> >> Yes, both work and produce reasonable output. > > Really weird. I have zero ideas at what this could be. The messages in > /dev/kmsg have seqnums, you see the 0 as the first? > > 5,0,0;Linux version 3.4.0-rc6+ ... > 6,1,0;Command line: root=/dev/sda1 ... > >> "cat /dev/kmsg" does hang >> at the end of the log until I CTRL-C it - is that expected? > > Yeah, 'cat' is a blocking read(), tools would use poll(). Found the bug. __log_buf isn't correctly aligned, so the very first message gets the struct log written to an unaligned address. Subsequent entries would be correctly aligned in log_store(), except the kernel has already crashed by then. The ts_nsec field is a 64-bit write, which triggers the alignment requirement. I'm not quite sure why the previous 16-bit writes for the various len fields didn't experience an issue, since __log_buf ended up at an odd address for me. Perhaps ARM doesn't have an alignment requirement for 16-bit writes, but does for 32-bit and greater. I'll post a patch in a minute. -- To unsubscribe from this list: send the line "unsubscribe linux-next" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html