Re: Splat in kernel RT while processing incoming network packets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 03, 2023 at 09:47:26AM -0300, Wander Lairson Costa wrote:
> Dear all,
> 
> I am writing to report a splat issue we encountered while running the
> Real-Time (RT) kernel in conjunction with Network RPS (Receive Packet
> Steering).
> 
> During some testing of the RT kernel version 6.4.0 with Network RPS enabled,
> we observed a splat occurring in the SoftIRQ subsystem. The splat message is as
> follows:
> 
> [   37.168920] ------------[ cut here ]------------
> [   37.168925] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:291 do_softirq_post_smp_call_flush+0x2d/0x60
> [   37.168935] Modules linked in: xt_conntrack(E) ...
> [   37.168976] Unloaded tainted modules: intel_cstate(E):4 intel_uncore(E):3
> [   37.168994] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G            E     -------  ---  6.4.0-0.rc2.23.test.eln127.x86_64+rt #1
> [   37.168996] Hardware name: Red Hat KVM, BIOS 1.15.0-2.module+el8.6.0+14757+c25ee005 04/01/2014
> [   37.168998] RIP: 0010:do_softirq_post_smp_call_flush+0x2d/0x60
> [   37.169001] Code: 00 0f 1f 44 00 00 53 89 fb 48 c7 c7 f7 98 be 96 e8 d8 97 d2 00 65 66 8b 05 f8 36 ...
> [   37.169002] RSP: 0018:ffffffff97403eb0 EFLAGS: 00010002
> [   37.169004] RAX: 0000000000000008 RBX: 0000000000000000 RCX: 0000000000000003
> [   37.169005] RDX: ffff992db7a34840 RSI: ffffffff96be98f7 RDI: ffffffff96bc23d8
> [   37.169006] RBP: ffffffff97410000 R08: ffff992db7a34840 R09: ffff992c87f8dbc0
> [   37.169007] R10: 00000000fffbfc67 R11: 0000000000000018 R12: 0000000000000000
> [   37.169008] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [   37.169011] FS:  0000000000000000(0000) GS:ffff992db7a00000(0000) knlGS:0000000000000000
> [   37.169013] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   37.169014] CR2: 00007f028b8da3f8 CR3: 0000000118f44001 CR4: 0000000000370eb0
> [   37.169015] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   37.169015] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   37.169016] Call Trace:
> [   37.169018]  <TASK>
> [   37.169020]  flush_smp_call_function_queue+0x78/0x80
> [   37.169026]  do_idle+0xb2/0xd0
> [   37.169030]  cpu_startup_entry+0x1d/0x20
> [   37.169032]  rest_init+0xd1/0xe0
> [   37.169037]  arch_call_rest_init+0xe/0x30
> [   37.169044]  start_kernel+0x342/0x420
> [   37.169046]  x86_64_start_reservations+0x18/0x30
> [   37.169051]  x86_64_start_kernel+0x96/0xa0
> [   37.169054]  secondary_startup_64_no_verify+0x10b/0x10b
> [   37.169059]  </TASK>
> [   37.169060] ---[ end trace 0000000000000000 ]---
> 
> It comes from [1].
> 
> The issue lies in the mechanism of RPS to defer network packets processing to
> other CPUs. It sends an IPI to the to the target CPU. The registered callback
> is rps_trigger_softirq, which will raise a softirq, leading to the following
> scenario:
> 
> CPU0                                    CPU1
> | netif_rx()                            |
> | | enqueue_to_backlog(cpu=1)           |
> | | | net_rps_send_ipi()                |
> |                                       | flush_smp_call_function_queue()
> |                                       | | was_pending = local_softirq_pending()
> |                                       | | __flush_smp_call_function_queue()
> |                                       | | rps_trigger_softirq()
> |                                       | | | __raise_softirq_irqoff()
> |                                       | | do_softirq_post_smp_call_flush()
> 
> That has the undesired side effect of raising a softirq in a function call,
> leading to the aforementioned splat.
> 
> The kernel version is kernel-ark [1], os-build-rt branch. It is essentially the

Correction: kernel-ark [2]

> upstream kernel with the PREEMPT_RT patches, and with RHEL configs. I can
> provide the .config.
> 
> The only solution I imagined so far was to modify RPS to process packtes in a
> kernel thread in RT. But I wonder how would be that be different than processing
> them in ksoftirqd.
> 
> Any inputs on the issue?
> 
> [1] https://elixir.bootlin.com/linux/latest/source/kernel/softirq.c#L306
> 

[2] https://gitlab.com/cki-project/kernel-ark

> Cheers,
> Wander
> 




[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux