Re: Regression due to 7ff9554 "printk: convert byte-buffer to variable-length record buffer"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/10/2012 02:26 PM, Kay Sievers wrote:
> On Thu, May 10, 2012 at 10:20 PM, Stephen Warren <swarren@xxxxxxxxxxxxx> wrote:
>> On 05/10/2012 02:15 PM, Kay Sievers wrote:
>>> On Thu, May 10, 2012 at 10:13 PM, Stephen Warren <swarren@xxxxxxxxxxxxx> wrote:
>>>> On 05/10/2012 02:09 PM, Stephen Warren wrote:
>>>>> On 05/10/2012 02:06 PM, Kay Sievers wrote:
>>>>>> On Thu, May 10, 2012 at 9:54 PM, Stephen Warren <swarren@xxxxxxxxxxxxx> wrote:
>>>>>>> On 05/09/2012 12:31 PM, Stephen Warren wrote:
>>>>>>>> For me, next-20120508 prints nothing when booted, and I think also
>>>>>>>> hangs. To solve this, I reverted:
>>>>>>>>
>>>>>>>> 7ff9554 printk: convert byte-buffer to variable-length record buffer
>>>>>>>>
>>>>>>>> In order to build, I also had to revert:
>>>>>>>>
>>>>>>>> c4e00da driver-core: extend dev_printk() to pass structured data
>>>>>>>>
>>>>>>>> Note: I'm running on an ARM system using a serial console, with
>>>>>>>> earlyprintk enabled.
>>>>>>>
>>>>>>> This issue still occurs in next-20120510.
>>>>>>>
>>>>>>> I've tracked it down to the assignment of msg->ts_nsec near the end of
>>>>>>> log_store(). If I comment this out, everything works. The problem is the
>>>>>>> assignment, not the call to local_clock():
>>>>>>>
>>>>>>> fails:
>>>>>>>        msg->ts_nsec = local_clock();
>>>>>>> fails:
>>>>>>>        msg->ts_nsec = 0;//local_clock();
>>>>>>> works:
>>>>>>>        //msg->ts_nsec = local_clock();
>>>>>>
>>>>>> Weird.
>>>>>>
>>>>>> What happens if you change it to:
>>>>>>   cpu_clock(logbuf_cpu);
>>>>>> ?
>>>>>>
>>>>>> If it works, the timestamps look ok?
>>>>>
>>>>> I doubt that would work - after all, assigning 0 fails, but not
>>>>> performing the assignment at all works. But, I'll go try it...
>>>>
>>>> Calling cpu_clock() instead of local_clock() fails in the same way.
>>>
>>> Ok, didn't really see the assign to 0 you tried, sorry. :)
>>>
>>> And 'dmesg' works when you run the box with the line commented out?
>>>
>>> And 'cat /dev/kmsg'?
>>
>> Yes, both work and produce reasonable output.
> 
> Really weird. I have zero ideas at what this could be. The messages in
> /dev/kmsg have seqnums, you see the 0 as the first?
> 
> 5,0,0;Linux version 3.4.0-rc6+ ...
> 6,1,0;Command line: root=/dev/sda1 ...
> 
>> "cat /dev/kmsg" does hang
>> at the end of the log until I CTRL-C it - is that expected?
> 
> Yeah, 'cat' is a blocking read(), tools would use poll().

Found the bug. __log_buf isn't correctly aligned, so the very first
message gets the struct log written to an unaligned address. Subsequent
entries would be correctly aligned in log_store(), except the kernel has
already crashed by then.

The ts_nsec field is a 64-bit write, which triggers the alignment
requirement. I'm not quite sure why the previous 16-bit writes for the
various len fields didn't experience an issue, since __log_buf ended up
at an odd address for me. Perhaps ARM doesn't have an alignment
requirement for 16-bit writes, but does for 32-bit and greater.

I'll post a patch in a minute.
--
To unsubscribe from this list: send the line "unsubscribe linux-next" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel]     [Linux USB Development]     [Yosemite News]     [Linux SCSI]

  Powered by Linux