Le Tue, Feb 11, 2025 at 12:57:33PM +0100, Paul Menzel a écrit : > Dear Frederic, > > > Thank you for your reply. > > > Am 10.02.25 um 14:26 schrieb Frederic Weisbecker: > > Le Mon, Feb 10, 2025 at 12:59:42PM +0100, Paul Menzel a écrit : > > > > Am 10.02.25 um 12:45 schrieb Michał Pecio: > > > > > > > > > > > > > > On Dell XPS 13 9360/0596KF, BIOS 2.21.0 06/02/2022, with Linux > > > > > > > > > > > 6.9-rc2+ > > > > > > > > > Just for the record, I am still seeing this with 6.14.0-rc1 > > > > > > > > Is this a regression? If so, which versions were not affected? > > > > > > Unfortunately, I do not know. Right now, my logs go back until September > > > 2024. > > > > > > Sep 22 13:08:04 abreu kernel: Linux version 6.11.0-07273-g1e7530883cd2 (build@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc (Debian 14.2.0-5) 14.2.0, GNU ld (GNU Binutils for Debian) 2.43.1) #12 SMP PREEMPT_DYNAMIC Sun Sep 22 09:57:36 CEST 2024 > > > > > > > How hard to reproduce? Wasn't it during resume from hibernation? > > > > > > It’s not easy to reproduce, and I believe it’s not related with resuming > > > from hibernation (which I do not use) or ACPI S3 suspend. I think, I can > > > force it more, when having the USB-C adapter with only the network cable > > > plugged into it, and then running `sudo powertop --auto-tune`. But sometimes > > > it seems unrelated. > > > > > > > IRQ isuses may be a red herring, this code here is a busy wait under > > > > spinlock. There are a few of those, they cause various problems. > > > > > > > > if (xhci_handshake(&xhci->op_regs->status, > > > > STS_RESTORE, 0, 100 * 1000)) { > > > > xhci_warn(xhci, "WARN: xHC restore state timeout\n"); > > > > spin_unlock_irq(&xhci->lock); > > > > return -ETIMEDOUT; > > > > } > > > > > > > > This thing timing out may be close to the root cause of everything. > > > > > > Interesting. Hopefully the USB folks have an idea. > > > > Handler #08 is NET_RX. So something raised the NET_RX on some non-appropriate > > place, perhaps... > > > > Can I ask you one more trace dump? > > > > I need: > > > > echo 1 > /sys/kernel/tracing/events/irq/softirq_raise/enable > > echo 1 > /sys/kernel/tracing/options/stacktrace > > > > Unfortunately this will also involve a small patch: > > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > index fa058510af9c..accd2eb8c927 100644 > > --- a/kernel/time/tick-sched.c > > +++ b/kernel/time/tick-sched.c > > @@ -1159,6 +1159,9 @@ static bool report_idle_softirq(void) > > if (local_bh_blocked()) > > return false; > > + trace_printk("STOP\n"); > > + trace_dump_stack(0); > > + tracing_off(); > > pr_warn("NOHZ tick-stop error: local softirq work is pending, handler #%02x!!!\n", > > pending); > > ratelimit++; > > Thank you for your help. I applied the patch on top of 6.14-rc2, and was > able to reproduce the issue. Please find the Linux messages attached, and > the trace can be downloaded [1]. So here is the offender: => __raise_softirq_irqoff => __napi_schedule => rtl8152_runtime_resume.isra.0 => rtl8152_resume => usb_resume_interface.isra.0 => usb_resume_both => __rpm_callback => rpm_callback => rpm_resume => __pm_runtime_resume => usb_autoresume_device => usb_remote_wakeup => hub_event => process_one_work => worker_thread => kthread => ret_from_fork => ret_from_fork_asm It is calling napi_schedule() from a non-interrupt. And since ____napi_schedule() assumes to be called from an interrupt, it raises the softirq accordingly without waking up ksoftirqd. Can you try the following fix (untested, sorry...) ? diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index 468c73974046..8f6ea4e7685c 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -8537,8 +8537,11 @@ static int rtl8152_runtime_resume(struct r8152 *tp) clear_bit(SELECTIVE_SUSPEND, &tp->flags); smp_mb__after_atomic(); - if (!list_empty(&tp->rx_done)) + if (!list_empty(&tp->rx_done)) { + local_bh_disable(); napi_schedule(&tp->napi); + local_bh_enable(); + } usb_submit_urb(tp->intr_urb, GFP_NOIO); } else { diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 67964dc4db95..1bd730b881f0 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -619,6 +619,17 @@ do { \ (!in_softirq() || in_irq() || in_nmi())); \ } while (0) +/* + * Assert to be either in hardirq or in serving softirq or with + * softirqs disabled. Verifies a safe context to queue a softirq + * with __raise_softirq_irqoff(). + */ +#define lockdep_assert_in_interrupt() \ +do { \ + WARN_ON_ONCE(__lockdep_enabled && !in_interrupt()); \ +} while (0) + + extern void lockdep_assert_in_softirq_func(void); #else @@ -634,6 +645,7 @@ extern void lockdep_assert_in_softirq_func(void); # define lockdep_assert_preemption_enabled() do { } while (0) # define lockdep_assert_preemption_disabled() do { } while (0) # define lockdep_assert_in_softirq() do { } while (0) +# define lockdep_assert_in_interrupt() do { } while (0) # define lockdep_assert_in_softirq_func() do { } while (0) #endif diff --git a/net/core/dev.c b/net/core/dev.c index c0021cbd28fc..80e415ccf2c8 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4666,6 +4666,7 @@ static inline void ____napi_schedule(struct softnet_data *sd, struct task_struct *thread; lockdep_assert_irqs_disabled(); + lockdep_assert_in_interrupt(); if (test_bit(NAPI_STATE_THREADED, &napi->state)) { /* Paired with smp_mb__before_atomic() in