Linux 3.2: FPU Issue in execve with Intel E5-2620v3 and E7-4880v2

"Cai, Jason" <Jason.Cai@xxxxxxxx> · Wed, 22 Mar 2017 01:58:58 +0000

Dear Kernel Hackers,

I'm Jason Cai, a kernel developer from Dell EMC. I hit the same issue as the
one Lennart Sorensen sent at Dec 19, 2016.

I narrow down the issue now. It seems that an unexpected DNA 
(Device not Available) may be triggered in the `execve` code path.
Specifically, it exists between `setup_new_exec()` and `start_thread()` in
file `load_elf_binary()`.

I've added a BUG_ON() just before `start_thread` in `load_elf_binary ` to 
assert the fpu status of the current process descriptor should be clean
when performing an exec. It gets triggered and the stack is as the following:

-----------------------------------------------------------------------------
(E3)[      1517.089157] current is bad: ffff8812227387c0 (abuse)
(E3)[      1517.089176] prev: fpu=ffff8811d846c100, fpu_src=ffff8817fbab7500, fpu_fork=ffff880bf5513740, fpu_exec=          (null)
(E3)[      1517.089190] has_fpu=1, fpu_counter=1, flags=402000, CR0=80050033
(E0)[      1517.089223] ------------[ cut here ]------------
(E2)[      1517.095250] kernel BUG at linux-3.2/fs/binfmt_elf.c:1064!
(U0)(MSG-KERN-00005):[      1517.106894] invalid opcode: 0000 [#1] SMP
(E4)[      1517.114030] CPU 23
(E4)[      1517.117055] Modules linked in: ...
(E4)[      1517.192079]
(E4)[      1517.194621] Pid: 29746, comm: abuse Tainted: P           O 3.2.33
(E4)[      1517.207783] RIP: 0010:[<ffffffff81129670>]  [<ffffffff81129670>] load_elf_binary+0x1858/0x1983
(E4)[      1517.218284] RSP: 0018:ffff8817fa15fd08  EFLAGS: 00010292
(E4)[      1517.225087] RAX: 0000000000000053 RBX: ffff8812227387c0 RCX: 0000000081000000
(E4)[      1517.233924] RDX: 0000000081000000 RSI: 0000000000000046 RDI: ffffffff81721140
(E4)[      1517.242761] RBP: ffff8817fa15fe18 R08: 0000000000000000 R09: 000000020fc00000
(E4)[      1517.251597] R10: ffff88187a15fc17 R11: 0000000000000000 R12: ffff880622e3ef80
(E4)[      1517.260432] R13: ffff8811c4333400 R14: ffff8812227387c0 R15: ffff8817fa15ff58
(E4)[      1517.269269] FS:  0000000000000000(0000) GS:ffff88183fd60000(0000) knlGS:0000000000000000
(E4)[      1517.279169] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
(E4)[      1517.286455] CR2: 00007fbca10dcba8 CR3: 00000011dd8a7000 CR4: 00000000001407e0
(E4)[      1517.295290] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
(E4)[      1517.304125] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
(E4)[      1517.312960] Process abuse (pid: 29746, threadinfo ffff8817fa15e000, task ffff8812227387c0)
(E0)[      1517.323055] Stack:
(E4)[      1517.326178]  0000000000000001 00007fffd47a98e8 00007fffd47a9988 ffff881200000008
(E4)[      1517.335384]  ffff880627961680 ffff8812227387c0 ffff8817fa15e000 ffff8817fa15e000
(E4)[      1517.344586]  ffff8817fa15e000 ffff8812227387c0 0000000000500988 0000000000500778
(E0)[      1517.353798] Call Trace:
(E4)[      1517.357416]  [<ffffffff810ec3a0>] search_binary_handler+0xd6/0x273
(E4)[      1517.365196]  [<ffffffff810edeed>] do_execve_common.clone.28+0x1e1/0x2e8
(E4)[      1517.373458]  [<ffffffff810ee00f>] do_execve+0x1b/0x1d
(E4)[      1517.379975]  [<ffffffff810092b1>] sys_execve+0x49/0xe1
(E4)[      1517.386589]  [<ffffffff813a4b4c>] stub_execve+0x6c/0xc0
(E0)[      1517.393293] Code: 81 31 c0 e8 c3 27 f1 ff 41 0f 20 c0 48 c7 c7 f0 49 51 81 8b 4b 14 0f b6 93 b8 01 00 00 48 8b b3 d8 04 00 00 31 c0 e8 a0 27 f1 ff <0f> 0b 49 8
b 95 98 00 00 00 48 8b 75 b8 4c 89 ff e8 ba 7d ed ff
(U1)(MSG-KERN-00005):[      1517.416621] RIP  [<ffffffff81129670>] load_elf_binary+0x1858/0x1983
(E4)[      1517.426164]  RSP <ffff8817fa15fd08>
(E4)[      1517.430961] ---[ end trace 5dcaec314d0b0edb ]---
(U0)(MSG-KERN-00018):[      1517.436994] Kernel panic - not syncing: Fatal exception
(E4)[      1517.445346] Pid: 29746, comm: abuse Tainted: P      D    O 3.2.33
(E4)[      1517.454276] Call Trace:
(E4)[      1517.457893]  [<ffffffff8139af77>] panic+0xb2/0x1d2
(E4)[      1517.464122]  [<ffffffff8103c75a>] ? kmsg_dump+0x5d/0xdf
(E4)[      1517.470825]  [<ffffffff8139eb8a>] oops_end+0xae/0xbe
(E4)[      1517.477246]  [<ffffffff81004b81>] die+0x5a/0x65
(E4)[      1517.483185]  [<ffffffff8139e6b8>] do_trap+0x121/0x130
(E4)[      1517.489703]  [<ffffffff81002a27>] do_invalid_op+0x96/0x9f
(E4)[      1517.496601]  [<ffffffff81129670>] ? load_elf_binary+0x1858/0x1983
(E4)[      1517.504280]  [<ffffffff813a63f5>] invalid_op+0x15/0x20
(E4)[      1517.510893]  [<ffffffff81129670>] ? load_elf_binary+0x1858/0x1983
(E4)[      1517.518575]  [<ffffffff81129670>] ? load_elf_binary+0x1858/0x1983
(E4)[      1517.526257]  [<ffffffff810ec3a0>] search_binary_handler+0xd6/0x273
(E4)[      1517.534035]  [<ffffffff810edeed>] do_execve_common.clone.28+0x1e1/0x2e8
(E4)[      1517.542289]  [<ffffffff810ee00f>] do_execve+0x1b/0x1d
(E4)[      1517.548810]  [<ffffffff810092b1>] sys_execve+0x49/0xe1
(E4)[      1517.555427]  [<ffffffff813a4b4c>] stub_execve+0x6c/0xc0
--------------------------------------------------------------------------------------------------

The kernel codes I'm testing are the same as the stable branch linux-3.2.y
AFAIK, there is no FPU instructions between `setup_new_exec()` and 
`start_thread() ` in `load_elf_binary()`.

The BUG_ON() codes are as the following:
--------------------------------------------------------------------------------------------------
if ((current->thread.has_fpu) || current->fpu_counter || tsk_used_math(current)) {
     // printk some status related to FPU ...
    BUG_ON(1);
}
--------------------------------------------------------------------------------------------------

Maybe the quick fix is that simply doesn't free the FPU state in `start_thread_common`.

Last but not least, by now, this issues can only be seen on the systems armed
with Intel E5-2620v3 and E7-4880v2.

Thus, I'm still wondering whether it's possible a CPU issue or something else? 
How can I verify it?

I would greatly appreciate if you kindly give me some feedback.

Best regards,
Jason Cai