Re: kernel BUG at mm/usercopy.c:102 -- pc : usercopy_abort

Xiubo Li <xiubli@xxxxxxxxxx> · Mon, 25 Mar 2024 20:06:25 +0800

On 3/25/24 18:14, David Hildenbrand wrote:
On 25.03.24 08:45, Xiubo Li wrote:
Hi guys,

We are hitting the same crash frequently recently with the latest kernel
when testing kceph, and the call trace will be something likes:

[ 1580.034891] usercopy: Kernel memory exposure attempt detected from
SLUB object 'kmalloc-192' (offset 82, size 499712)!^M
[ 1580.045866] ------------[ cut here ]------------^M
[ 1580.050551] kernel BUG at mm/usercopy.c:102!^M
^M
Entering kdb (current=0xffff8881211f5500, pid 172901) on processor 4
Oops: (null)^M
due to oops @ 0xffffffff8138cabd^M
CPU: 4 PID: 172901 Comm: fsstress Tainted: G S 6.6.0-g623393c9d50c #1^M
Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015^M
RIP: 0010:usercopy_abort+0x6d/0x80^M
Code: 4c 0f 44 d0 41 53 48 c7 c0 1c e9 13 82 48 c7 c6 71 62 13 82 48 0f
45 f0 48 89 f9 48 c7 c7 f0 6b 1b 82 4c 89 d2 e8 63 2b df ff <0f> 0b 49
c7 c1 44 c8 14 82 4d 89 cb 4d 89 c8 eb a5 66 90 f3 0f 1e^M
RSP: 0018:ffffc90006dfba88 EFLAGS: 00010246^M
RAX: 000000000000006a RBX: 000000000007a000 RCX: 0000000000000000^M
RDX: 0000000000000000 RSI: ffff88885fd1d880 RDI: ffff88885fd1d880^M
RBP: 000000000007a000 R08: 0000000000000000 R09: c0000000ffffdfff^M
R10: 0000000000000001 R11: ffffc90006dfb930 R12: 0000000000000001^M
R13: ffff8882b7bbed12 R14: ffff88827a375830 R15: ffff8882b7b44d12^M
FS:  00007fb24c859500(0000) GS:ffff88885fd00000(0000)
knlGS:0000000000000000^M
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
CR2: 000055c2bcf9eb00 CR3: 000000028956c005 CR4: 00000000001706e0^M
Call Trace:^M
   <TASK>^M
   ? kdb_main_loop+0x32c/0xa10^M
   ? kdb_stub+0x216/0x420^M
more>

You can see more detail in ceph tracker
https://tracker.ceph.com/issues/64471.

Where is the full backtrace? Above contains only the backtrace of kdb.

Hi David,

The bad news is that there is no more backtrace. All the failures we hit 
are similar with the following logs:

That link also contains:

Entering kdb (current=0xffff9115d14fb980, pid 61925) on processor 5 
Oops: (null)^M
due to oops @ 0xfffffffface3a1d2^M
CPU: 5 PID: 61925 Comm: ld Kdump: loaded Not tainted 
5.14.0-421.el9.x86_64 #1^M
Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015^M
RIP: 0010:usercopy_abort+0x74/0x76^M
Code: 14 74 ad 51 48 0f 44 d6 49 c7 c3 cb 9f 73 ad 4c 89 d1 57 48 c7 
c6 60 83 75 ad 48 c7 c7 00 83 75 ad 49 0f 44 f3 e8 1b 3b ff ff <0f> 0b 
0f b6 d3 4d 89 e0 48 89 e9 31 f6 48 c7 c7 7f 83 75 ad e8 73^M
RSP: 0018:ffffbb97c16af8d0 EFLAGS: 00010246^M
RAX: 0000000000000072 RBX: 0000000000000112 RCX: 0000000000000000^M
RDX: 0000000000000000 RSI: ffff911d1fd60840 RDI: ffff911d1fd60840^M
RBP: 0000000000004000 R08: 80000000ffff84b4 R09: 0000000000ffff0a^M
R10: 0000000000000004 R11: 0000000000000076 R12: ffff9115c0be8b00^M
R13: 0000000000000001 R14: ffff911665df9f68 R15: ffff9115d16be112^M
FS:  00007ff20442eb80(0000) GS:ffff911d1fd40000(0000) 
knlGS:0000000000000000^M
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
CR2: 00007ff20446142d CR3: 00000001215ec003 CR4: 00000000003706e0^M
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400^M
Call Trace:^M
 <TASK>^M
 ? show_trace_log_lvl+0x1c4/0x2df^M
more>

Don't we have more information about the calltrace somewhere? (or a 
reproducer?)

There is no reproducer and each time the failure test cases are 
different. So it seems randomly.

Thanks

- Xiubo