Re: System freezes with kernel >2.6.19 - sata_nv [added crash info kern 2.6.21.1]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tejun Heo wrote:
> Robert Hancock wrote:
>   
>> Tejun Heo wrote:
>>     
>>> Stefan wrote:
>>>       
>>>> Hi folks,
>>>>
>>>> yesterday I upgraded  kernel 2.6.19 to 2.6.20 (gentoo kernel). Now my
>>>> box locks up about 10 min after boot.
>>>> After that I tested with a vanilla 2.6.21.1 it shows the same behavior.
>>>> I'm attaching a kern log file from the 2.6.20. The 2.6.21.1 locked up so
>>>> hard, that there was no trace left in the log file.
>>>> If I switch back to 2.6.19 everything is fine again.
>>>> If necessary I will hook up a laptop to this box, so I can capture
>>>> messages via netconsole.
>>>>         
>>> Yes please.
>>>       


Okay, I had time to set this up. I'm attaching the log messages I got
via netconsole.

>>>       
>>>> This machine is running an AMD X2 64, NFORCE4 (ASUS A8N-E)
>>>>
>>>> I haven't been following the development of libata for a while, but from
>>>> the 2.6.20 changelog It looks like there have been some major changes.
>>>>
>>>> Just let me know what kind of information you need in order to narrow it
>>>> down.
>>>>         
>>> Does giving 'sata_nv.adma=0' kernel parameter make any difference?
>>>       

I tested about 20h with adma disabled, the crash won't occur.

If I remove

sata_nv.adma=0

from boot options again it doesn't take long until my machine locks up.

[Attached dmesg output with 2.6.21.1 kernel + crash info I got via
netconsole]


I hope this is useful to you guys.

Cheers Stefan

>> If adma=0 words, then that means it's either an ADMA related problem or
>> that SAMSUNG HD401LJ drive has some problems with NCQ (since ADMA off
>> means NCQ off as well). I would say the latter is more likely.
>>     
>
> I don't have first hand experience with the particular model but I'll be
> surprised if they screwed their firmware up with new generation of
> harddisks.  Firmware on the previous generation drives was pretty good
> and they don't get worse usually.
>
>   
>> We should really have some kind of "noncq" kernel parameter we can use
>> to help debugging these problems. Though, later kernels are supposed to
>> switch it off automatically after too many errors..
>>     
>
> Till now there hasn't been any case where a broken NCQ prevented a
> machine from booting but, yeah, having such thing would be nice for
> debugging.
>
>   

Attachment: bootlog.txt.gz
Description: Unix tar archive

Attachment: crashtrace.txt.gz
Description: Unix tar archive


[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux