Re: kernel panics with 4.14.X versions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 17/04/2018 02:12 μμ, Jan Kara wrote:
> On Tue 17-04-18 01:31:24, Pavlos Parissis wrote:
>> On 16/04/2018 04:40 μμ, Jan Kara wrote:
> 
> <snip>
> 
>>> How easily can you hit this?
>>
>> Very easily, I only need to wait 1-2 days for a crash to occur.
> 
> I wouldn't call that very easily but opinions may differ :). Anyway it's
> good (at least for debugging) that it's reproducible.
> 

Unfortunately, I can't reproduce it, so waiting 1-2 days is the only option I have.

>>> Are you able to run debug kernels
>>
>> Well, I was under the impression I do as I have:
>>   grep -E 'DEBUG_KERNEL|DEBUG_INFO' /boot/config-4.14.32-1.el7.x86_64
>>   CONFIG_DEBUG_INFO=y
>>   # CONFIG_DEBUG_INFO_REDUCED is not set
>>   # CONFIG_DEBUG_INFO_SPLIT is not set
>>   # CONFIG_DEBUG_INFO_DWARF4 is not set
>>   CONFIG_DEBUG_KERNEL=y
>>
>> Do you think that my kernel doesn't produce a proper crash dump?
>> I have a production cluster where I can run any kernel we need, so if I need
>> to compile again with different settings I can certainly do that.
> 
> OK, good. So please try running 4.16 as you mention below to verify whether
> this is just a -stable regression or also a problem in the current upstream
> kernel. Based on your results with 4.16 I'll prepare a debug patch for you to
> apply on top of 4.14.32 so that we can debug this further.
> 
>>> / inspect
>>> crash dumps when the issue occurs?
>>
>> I can't do that as the server isn't responsive and I can only power cycle it.
> 
> Well, kernel crash dumps work in that situation as well - when the kernel
> panics, it will kexec into a new kernel and dump memory of the old kernel
> to disk. It can then be investigated with the 'crash' utility. But
> obviously you don't have this set up and don't have experience with this so
> let's go via a standard 'debug patch' route.
> 
>>> Also testing with the latest mainline
>>> kernel (4.16) would be welcome whether this isn't just an issue with the
>>> backport of fsnotify fixes from Miklos.
>>
>> I can try the kernel-ml-4.16.2 from elrepo (we use CentOS 7).
> 
> Yes, that would be good.
> 

I have production server running 4.16.2 and no kernel crash dumps yet.
Let's wait another day before we say anything.

Cheers,
Pavlos

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]