Hi Adrian, On Tue, Sep 03, 2024 at 06:33:55PM +0200, John Paul Adrian Glaubitz wrote: > Hi Feng, > > > When debugging a kernel hang during suspend/resume, there are random > > memory corruptions in different places like being detected by scheduler > > with error message: > > > > "Kernel panic - not syncing: corrupted stack end detected inside scheduler" > > > > Dump the corrupted memory around the stack end will give more direct > > hints about how the memory is corrupted: > > > > " > > Corrupted Stack: ff11000122770000: ff ff ff ff ff ff 14 91 82 3b 78 e8 08 00 45 00 .........;x...E. > > Corrupted Stack: ff11000122770010: 00 1d 2a ff 40 00 40 11 98 c8 0a ef 30 2c 0a ef ..*.@.@.....0,.. > > Corrupted Stack: ff11000122770020: 30 ff a2 00 22 3d 00 09 9a 95 2a 00 00 00 00 00 0..."=....*..... > > ... > > Kernel panic - not syncing: corrupted stack end detected inside scheduler > > " > > > > And with it, the culprit was quickly identified to be an ethernet > > driver with its DMA operations. > > > > Signed-off-by: Feng Tang <feng.tang@xxxxxxxxx> > > --- > > kernel/sched/core.c | 12 +++++++++++- > > 1 file changed, 11 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index a795e030678c..1280f7012bc5 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -5949,8 +5949,18 @@ static noinline void __schedule_bug(struct task_struct *prev) > > static inline void schedule_debug(struct task_struct *prev, bool preempt) > > { > > #ifdef CONFIG_SCHED_STACK_END_CHECK > > - if (task_stack_end_corrupted(prev)) > > + if (task_stack_end_corrupted(prev)) { > > + unsigned long *ptr = end_of_stack(prev); > > + > > + /* Dump 16 ulong words around the corruption point */ > > +#ifdef CONFIG_STACK_GROWSUP > > + ptr -= 15; > > +#endif > > + print_hex_dump(KERN_ERR, "Corrupted Stack: ", > > + DUMP_PREFIX_ADDRESS, 16, 1, ptr, 16 * sizeof(*ptr), 1); > > + > > panic("corrupted stack end detected inside scheduler\n"); > > + } > > > > if (task_scs_end_corrupted(prev)) > > panic("corrupted shadow stack detected inside scheduler\n"); > > Have you gotten any feedback on this? Would be nice to get this merged as we're > seeing crashes due to stack corruption on sparc from time to time and having the > end of the stack dumped in such cases would make debugging here a bit easier. Thanks for the review and providing feedback! So far I haven't got response from maintainers yet. Hi Peter and maintainers, Could you help to review this patch which can help debugging those naughty memory corruption issues? Thanks! There is a v2 version which can be applied to latest linux-next branch: https://lore.kernel.org/lkml/20240207143523.438816-1-feng.tang@xxxxxxxxx/ - Feng