Am 06.01.2018 um 12:40 schrieb Simon Leinen: > Yves-Alexis Perez wrote: >> since kernel 4.11 (sorry it took so long to report) I have a box >> failing to boot with a NULL pointer dereference (the box is stuck >> there afterwards). > > I get the same result on a Quanta server with several 4.13 and 4.14 > kernels (from the Ubuntu "mainline" and Xenial hwe-edge PPAs). > > This (I guess) problem had been reported by Stefan Priebe under > "isci regression in 4.11.0-rc2 by scsi: libsas: allow async aborts" > on 8 November, 2017[1]. That report didn't elicit any response here. Yes - also Cristoph Hellwig hasn't responded yet. So i reverted that commit on my own as well. Stefan > >> The bug has also been reported to the Debian BTS ([2]) and a >> suggestion to revert 90965761 has been made. I can confirm it fix the >> boot issue. > > The Debian people have implemented the suggestion to revert 90965761 as > of their 4.14.12-1 kernel package[2]. > >> I don't have the complete stack trace at hand but there's an example >> in the Debian bug. > > Here's a stack trace from my server. It was copied and pasted from a > serial console (IPMI SOL), I hope it's complete. > > [ 9.184043] BUG: unable to handle kernel NULL pointer dereference at (null) > [ 9.184055] IP: isci_task_abort_task+0x43/0x400 [isci] > [ 9.184056] PGD 0 > [ 9.184056] P4D 0 > [ 9.184057] > [ 9.184058] Oops: 0000 [#1] SMP > [ 9.184060] Modules linked in: aesni_intel(+) aes_x86_64 crypto_simd glue_helper cryptd mei_me intel_cstate intel_rapl_perf mei shpchp lpc_ich ipmi_si(+) mac_hid kvm_intel kvm irqbypass ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf ipmi_msghandler autofs4 btrfs xor raid6_pq ast ttm drm_kms_helper ixgbe igb syscopyarea isci sysfillrect i2c_algo_bit dca sysimgblt libsas fb_sys_fops ptp mdio drm scsi_transport_sas pps_core wmi > [ 9.184084] CPU: 18 PID: 434 Comm: kworker/u48:1 Not tainted 4.13.0-21-generic #24~16.04.1-Ubuntu > [ 9.184084] Hardware name: Quanta S210-X12RS V2/S210-X12RS V2, BIOS S2RQ4A08 08/12/2013 > [ 9.184090] Workqueue: scsi_tmf_0 scmd_eh_abort_handler > [ 9.184091] task: ffff96507bb05d00 task.stack: ffffa2de87bb4000 > [ 9.184095] RIP: 0010:isci_task_abort_task+0x43/0x400 [isci] > [ 9.184095] RSP: 0018:ffffa2de87bb7c88 EFLAGS: 00010246 > [ 9.184096] RAX: 0000000000000000 RBX: ffff9650782f11a8 RCX: 0000000000000000 > [ 9.184097] RDX: 0000000000000000 RSI: ffff9650782f11a8 RDI: 0000000000000000 > [ 9.184097] RBP: ffffa2de87bb7e28 R08: 0000000000000000 R09: 0000000000000001 > [ 9.184098] R10: 000000000000b8cb R11: 00000000000002f3 R12: ffff9650782f1148 > [ 9.184098] R13: ffff9650758cb800 R14: 0000000000000008 R15: 0000000000000000 > [ 9.184099] FS: 0000000000000000(0000) GS:ffff9660bf380000(0000) knlGS:0000000000000000 > [ 9.184100] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 9.184100] CR2: 0000000000000000 CR3: 000000004b009000 CR4: 00000000001406e0 > [ 9.184101] Call Trace: > [ 9.184107] ? cpumask_next_and+0x31/0x50 > [ 9.184110] ? load_balance+0x1b5/0x9c0 > [ 9.184114] ? sched_clock+0x9/0x10 > [ 9.184116] ? sched_clock+0x9/0x10 > [ 9.184117] ? sched_clock+0x9/0x10 > [ 9.184120] ? sched_clock_cpu+0x11/0xb0 > [ 9.184121] ? pick_next_task_fair+0x3c7/0x560 > [ 9.184123] ? __switch_to+0x211/0x510 > [ 9.184125] ? put_prev_entity+0x27/0x100 > [ 9.184129] sas_eh_abort_handler+0x30/0x50 [libsas] > [ 9.184131] scmd_eh_abort_handler+0x74/0x230 > [ 9.184135] process_one_work+0x156/0x410 > [ 9.184136] worker_thread+0x4b/0x460 > [ 9.184138] kthread+0x109/0x140 > [ 9.184139] ? process_one_work+0x410/0x410 > [ 9.184140] ? kthread_create_on_node+0x70/0x70 > [ 9.184143] ret_from_fork+0x25/0x30 > [ 9.184144] Code: 08 48 81 ec 78 01 00 00 c7 85 78 fe ff ff 00 00 00 00 c7 85 80 fe ff ff 00 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 <48> 8b 07 48 8b 40 30 48 8b 80 90 02 00 00 4c 8b a0 28 01 00 00 > [ 9.184160] RIP: isci_task_abort_task+0x43/0x400 [isci] RSP: ffffa2de87bb7c88 > [ 9.184161] CR2: 0000000000000000 > [ 9.184162] ---[ end trace bf9920b58fca631f ]--- > >> The machine is a Dell Precision T5600 with the following SATA >> controllers: > >> 00:1f.2 SATA controller: Intel Corporation C600/X79 series chipset 6-Port SATA >> AHCI Controller (rev 05) >> 05:00.0 Serial Attached SCSI controller: Intel Corporation C602 chipset 4-Port >> SATA Storage Control Unit (rev 05) > > Mine is a Quanta S210-X12RS server with only one SATA controller: > > 08:00.0 Serial Attached SCSI controller: Intel Corporation C602 chipset 4-Port SATA Storage Control Unit (rev 05) > > Connected to that SATA controller are two Samsung 850 EVO 250GB SSDs and > one 3TB WD Red disk. > >> If you need more information or need me to test something, please ask. > > Likewise. > > Best regards, >