Hello,
I am currently trying to set up a newly built system with a Skylake
6700k CPU but am having an extremely
reproducible kernel panic every time I connect a monitor to the display
port connector of the Intel
integrated graphics chip.
This issue occurs either immediately upon connecting a display port
monitor to the machine while it is up
or late in the boot process if the display port is connected at boot
time.
The monitor which I am using is a Dell U3415W ultra wide and the
motherboard is a MSI Z170A Gaming M7.
I am not entirely surprised by the link train errors as there appear to
be various posts about users having
problems with this monitor and display port training, what surprises me
most is the fact it is causing a kernel panic.
Upon the panic happening the kernel prints the following dump (to the
second non DP monitor), (note this is hand copied as I
have no way to dump the messages anywhere but the display so pardon any
small typos).
[ 22.318630] [drm:intel_dp_start_link_traln [i915]] *ERROR* too many
full retries, give up
[ 22.365449] [drm:intel_dp_start_link_traln [i915]] *ERROR* too many
full retries, give up
[ 22.420272] [drm:intel_dp_start_link_traln [i915]] *ERROR* too many
full retries, give up
[ 22.475105] [drm:intel_dp_start_link_traln [i915]] *ERROR* too many
full retries, give up
[ 22.529931] [drm:intel_dp_start_link_traln [i915]] *ERROR* too many
full retries, give up
[ 22.584759] [drm:intel_dp_start_link_traln [i915]] *ERROR* too many
full retries, give up
[ 22.639588] [drm:intel_dp_start_link_traln [i915]] *ERROR* too many
full retries, give up
[ 22.649935] [drm:intel_dp_start_link_traln [i915]] *ERROR* too many
full retries, give up
[ 22.650532] [drm:intel_dp_start_link_traln [i915]] *ERROR* too many
full retries, give up
[ 24.329955] Kernel panic - not syncing: Timeout: Not all CPUs
entered broadcast exception handler
[ 25.345911] Shutting down cpus with NMI
[ 25.356092] Kernel offset: disabled
[ 25.356101] Rebooting in 30 seconds.
If running kernel 4.2 occasionally these errors are followed by what
seems to be a an mce machine check exception mentioning
a corrupt processor context which is very hard to note down as it is
only on the screen very briefly. However if running the
latest kernel from https://github.com/torvalds/linux only the above
error occurs, not the mce exception. I am pretty confident
the mce exception is spurious due to this and the fact the system
otherwise tests out fine.
I apologise if this report is a little sparse on details, it is very
hard to post mortem debug the system due to the panic and
the fact I have no available serial terminal or hardware debugger.
Otherwise the system flawlessly passes memtest86+ and is completely
stable even under heavy load.
This issue seems to occur on every kernel I have tested so far including
a stock ubuntu 15.4, a vanilla 4.0.5 kernel,
a vanilla 4.2.0 kernel and the head of https://github.com/torvalds/linux
as of a few hours ago.
The kernel config used for the kernel taken from git is available here:
http://paste2.org/MH9vV4Le
The 4.2 and 4.0.5 configs were extremely similar and only differ in the
new entries made by oldconfig.
If there is anything I can do to produce more info I am more than happy
to do so.
Or if this is not the right mailing list for this issue please let me
know where would be better.
Many thanks,
Matthew
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx