On Tue, Mar 18, 2025 at 4:00 PM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: > > On Tue, Mar 18, 2025 at 01:14:40PM -0700, Song Liu wrote: > > > > > > See for example all the error conditions in the x86 version of > > > arch_stack_walk_reliable() and in arch/x86/kernel/unwind_frame.c. > > > > I guess I missed this part: > > > > diff --git i/arch/arm64/kernel/stacktrace.c w/arch/arm64/kernel/stacktrace.c > > index 69d0567a0c38..3bb8e3ea4c4b 100644 > > --- i/arch/arm64/kernel/stacktrace.c > > +++ w/arch/arm64/kernel/stacktrace.c > > @@ -268,6 +268,8 @@ kunwind_next(struct kunwind_state *state) > > case KUNWIND_SOURCE_TASK: > > case KUNWIND_SOURCE_REGS_PC: > > err = kunwind_next_frame_record(state); > > + if (err && err != -ENOENT) > > + state->common.unreliable = true; > > break; > > default: > > err = -EINVAL; > > I still see some issues: > > - do_kunwind() -> kunwind_recover_return_address() can fail > > - do_kunwind() -> consume_state() can fail Great points! I have got the following: diff --git i/arch/arm64/kernel/stacktrace.c w/arch/arm64/kernel/stacktrace.c index 69d0567a0c38..852e4b322209 100644 --- i/arch/arm64/kernel/stacktrace.c +++ w/arch/arm64/kernel/stacktrace.c @@ -139,6 +139,7 @@ kunwind_recover_return_address(struct kunwind_state *state) (void *)state->common.fp); if (state->common.pc == orig_pc) { WARN_ON_ONCE(state->task == current); + state->common.unreliable = true; return -EINVAL; } state->common.pc = orig_pc; @@ -268,6 +269,8 @@ kunwind_next(struct kunwind_state *state) case KUNWIND_SOURCE_TASK: case KUNWIND_SOURCE_REGS_PC: err = kunwind_next_frame_record(state); + if (err && err != -ENOENT) + state->common.unreliable = true; break; default: err = -EINVAL; @@ -404,12 +407,16 @@ static __always_inline bool arch_kunwind_reliable_consume_entry(const struct kunwind_state *state, void *cookie) { struct kunwind_reliable_consume_entry_data *data = cookie; + bool ret; if (state->common.unreliable) { data->unreliable = true; return false; } - return data->consume_entry(data->cookie, state->common.pc); + ret = data->consume_entry(data->cookie, state->common.pc); + if (!ret) + data->unreliable = true; + return ret; } noinline noinstr int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, > - even in the -ENOENT case the unreliable bit has already been set > right before the call to kunwind_next_frame_record_meta(). For this one, do you mean we set state->common.unreliable, but failed to propagate it to data.unreliable? Thanks, Song