Re: yet another approach Was: [PATCH bpf-next v3 4/5] bpf, x86: Add jit support for private stack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 10/3/24 10:22 PM, Yonghong Song wrote:

On 10/3/24 3:32 PM, Alexei Starovoitov wrote:
On Thu, Oct 3, 2024 at 1:44 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
Looks like the idea needs more thought.

in_task_stack() won't recognize the private stack,
so it will look like stack overflow and double fault.

do you have CONFIG_VMAP_STACK ?
Yes, my above test runs fine withCONFIG_VMAP_STACK. Let me guard private stack support with
CONFIG_VMAP_STACK for now. Not sure whether distributions enable
CONFIG_VMAP_STACK or not.
Good! but I'm surprised it makes a difference.

That only for the test case I tried. Now I tried the whole bpf selftests
with CONFIG_VMAP_STACK on. There are still some failures. Some of them
due to stack protector. I disabled stack protector and then those stack
protector error gone. But some other errors show up like below:

[   27.186581] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [   27.187480] BUG: unable to handle page fault for address: ffff888109572800
[   27.188299] #PF: supervisor instruction fetch in kernel mode
[   27.189085] #PF: error_code(0x0011) - permissions violation

or

[   27.736844] BUG: unable to handle page fault for address: 0000000080000000
[   27.737759] #PF: supervisor instruction fetch in kernel mode
[   27.738631] #PF: error_code(0x0010) - not-present page
[   27.739455] PGD 0 P4D 0
[   27.739818] Oops: Oops: 0010 [#1] PREEMPT SMP PTI

...

Some further investigations are needed.


I found one failure case (with stackprotector disabled):

[   20.032611] traps: PANIC: double fault, error_code: 0x0
[   20.032615] Oops: double fault: 0000 [#1] PREEMPT SMP PTI
[   20.032619] CPU: 0 UID: 0 PID: 1959 Comm: test_progs Tainted: G           OE      6.11.0-10576-g17baa0096769-dirty #1006
[   20.032623] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[   20.032624] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[   20.032626] RIP: 0010:error_entry+0x17/0x140
[   20.032633] Code: ff 0f 01 f8 e9 56 fe ff ff 90 90 90 90 90 90 90 90 90 90 56 48 8b 74 24 08 48 89 7c 24 08 52 51 50 41 50 41 51 41 52 49
[   20.032635] RSP: 0018:ffffe8ffff400000 EFLAGS: 00010093
[   20.032637] RAX: ffffe8ffff4000a8 RBX: ffffe8ffff4000a8 RCX: ffffffff82201737
[   20.032639] RDX: 0000000000000000 RSI: ffffffff8220128d RDI: ffffe8ffff4000a8
[   20.032640] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[   20.032641] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   20.032642] R13: 0000000000000000 R14: 000000000002ed80 R15: 0000000000000000
[   20.032643] FS:  00007f8a3a2006c0(0000) GS:ffff888237c00000(0000) knlGS:ffff888237c00000
[   20.032645] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   20.032646] CR2: ffffe8ffff3ffff8 CR3: 0000000103580002 CR4: 0000000000370ef0
[   20.032649] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   20.032650] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   20.032651] Call Trace:
[   20.032660]  <#DF>
[   20.032664]  ? __die_body+0xaf/0xc0
[   20.032667]  ? die+0x2f/0x50
[   20.032670]  ? exc_double_fault+0xbf/0xd0
[   20.032674]  ? asm_exc_double_fault+0x23/0x30
[   20.032678]  ? restore_regs_and_return_to_kernel+0x1b/0x1b
[   20.032681]  ? asm_exc_page_fault+0xd/0x30
[   20.032684]  ? error_entry+0x17/0x140
[   20.032687]  </#DF>

The private stack for cpu 0:
  priv_stack_ptr cpu 0 = [ffffe8ffff434000, ffffe8ffff438000] (total 16KB)
That is, the top stack is ffffe8ffff438000 and the bottom stack is ffffe8ffff434000.

During bpf execution, a softirq may happen, at that point,
stack pointer becomes:
   RSP: 0018:ffffe8ffff400000 (see above)
and there is a read/write (mostly write) to address
   CR2: ffffe8ffff3ffff8
And this may cause a fault.
After this fault, there are some further access and probably because
of invalid stack, double fault happens.

So the quesiton is why RSP is reset to ffffe8ffff400000?
I have not figured out which code changed this? Maybe somebody can help?


Please still root cause the crash without VMAP_STACK.

Sure. Let me investigate cases with VMAP_STACK first and
then will try to look at it without VMAP_STACK.


We need to do a lot more homework here before proceeding.
Look at arch/x86/kernel/dumpstack_64.c
At least we need new stack_type for priv stack.
stack_type_unknown doesn't inspire confidence.
Need to make sure stack trace is still reliable with priv stack.
Though it may look appealing from performance pov.
We may need to go back to r9 approach with push/pop around calls,
since that is surely keeping unwinder happy
while this approach will have to teach unwinder.

Good point.






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux