Re: 4.14.28 crash on tcp_push

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Pavlos,

On Tue, Mar 20, 2018 at 01:01:38PM +0100, Pavlos Parissis wrote:
> Hi,
> 
> We were upgrading a production system from 4.14.20 to 4.14.28 and we got the following crash and I
> was wondering if anyone has seen similar crash:
> 
> [  346.435832] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> [  346.473216] IP: tcp_push+0x42/0x120
> [  346.489607] PGD 8000001838949067 P4D 8000001838949067 PUD 183894a067 PMD 0
> [  346.523318] Oops: 0002 [#1] SMP PTI
> [  346.540395] Modules linked in: sctp_diag sctp dccp_diag dccp udp_diag unix_diag tcp_diag
> inet_diag 8021q garp mrp input_leds joydev xfs libcrc32c loop vfat fat x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> pcbc aesni_intel iTCO_wdt crypto_simd glue_helper cryptd iTCO_vendor_support intel_cstate lpc_ich
> intel_rapl_perf mfd_core hpwdt i2c_i801 hpilo pcspkr wmi sg ipmi_si ipmi_devintf ipmi_msghandler
> acpi_power_meter shpchp ioatdma ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> syscopyarea sysfillrect sysimgblt fb_sys_fops sd_mod ttm crc32c_intel ixgbe mdio hpsa tg3 i40e drm
> dca ptp pps_core scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod dax
> [  346.854574] CPU: 5 PID: 1533 Comm: carbon-submissi Not tainted 4.14.28-1.el7.x86_64 #1
> [  346.892452] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 04/25/2017
> [  346.931641] task: ffff88183806c5c0 task.stack: ffffc90007ea8000
> [  346.959768] RIP: 0010:tcp_push+0x42/0x120
> [  346.978914] RSP: 0018:ffffc90007eabc78 EFLAGS: 00010246
> [  347.004199] RAX: 0000000000000000 RBX: 00000000000000c2 RCX: 0000000000000001
> [  347.038684] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88184ad0c800
> [  347.073236] RBP: ffffc90007eabc78 R08: 000000000000ffcb R09: 0000000000000257
> [  347.108070] R10: ffff88184ad0c958 R11: 000000000000ffcb R12: 00000000ffffffe0
> [  347.142006] R13: 00000000ffffffe0 R14: ffff88184ad0c800 R15: ffff88184ad0c958
> [  347.176290] FS:  00007fbad3ff7700(0000) GS:ffff880c4fd40000(0000) knlGS:0000000000000000
> [  347.215545] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  347.243091] CR2: 0000000000000038 CR3: 0000001838bd4004 CR4: 00000000001606e0
> [  347.276950] Call Trace:
> [  347.288526]  tcp_sendmsg_locked+0x118/0xe50
> [  347.308321]  tcp_sendmsg+0x2c/0x50
> [  347.324517]  inet_sendmsg+0x37/0xb0
> [  347.341379]  sock_sendmsg+0x3e/0x50
> [  347.358018]  sock_write_iter+0x85/0xf0
> [  347.376095]  __vfs_write+0xfb/0x160
> [  347.392961]  vfs_write+0xb2/0x1b0
> [  347.408915]  ? syscall_trace_enter+0x1cd/0x2b0
> [  347.430458]  SyS_write+0x55/0xc0
> [  347.446047]  do_syscall_64+0x79/0x1b0
> [  347.463757]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [  347.488102] RIP: 0033:0x7fbae295a6ad
> [  347.505100] RSP: 002b:00007fbad3ff6e60 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
> [  347.541196] RAX: ffffffffffffffda RBX: 00000000000000c7 RCX: 00007fbae295a6ad
> [  347.575649] RDX: 00000000000000c7 RSI: 00007fbabc06ab60 RDI: 0000000000000013
> [  347.609762] RBP: 000000000000000a R08: 00007fbabc06ab60 R09: 00000000022a58f0
> [  347.643653] R10: 0000000000001a05 R11: 0000000000000293 R12: 00007fbabc06ab60
> [  347.677684] R13: 000000000200f040 R14: 00000000022a1840 R15: 00000000000000cb
> [  347.712207] Code: 48 8b 87 60 01 00 00 4c 8d 97 58 01 00 00 41 89 d3 ba 00 00 00 00 49 39 c2 48
> 0f 44 c2 89 f2 81 e2 00 80 00 00 0f 85 af 00 00 00 <80> 48 38 08 44 8b 8f 74 06 00 00 44 89 8f 7c 06
> 00 00 83 e6 01
> [  347.803312] RIP: tcp_push+0x42/0x120 RSP: ffffc90007eabc78
> [  347.829666] CR2: 0000000000000038
> [  347.845805] ---[ end trace 031807a627822772 ]---
> [  347.873681] Kernel panic - not syncing: Fatal exception
> [  347.898899] Kernel Offset: disabled
> [  347.920580] Rebooting in 70 seconds..

Interesting, I also experienced a spontaneous panic on my home firewall
after upgrading it from 4.14.10 to 4.14.27, but I didn't have any symbol
in the traces so the dump wasn't exploitable. All I know is that it was
a NULL deref with a very small offset as well. It may be totally unrelated
though but the coincidence is troubling, especially since I haven't had a
panic in -stable for a very long time.

Ah I've just seen your second e-mail. So if it's the same as the patch you
pointed, the bug is 4.14-only and the fix as well. It will likely come with
the next batch of networking backports.

Cheers,
Willy



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]