On Tue, Jun 04, 2024 at 05:05:59PM +0100, Pierre-Clément Tosi wrote: > On Mon, Jun 03, 2024 at 03:48:08PM +0100, Will Deacon wrote: > > On Wed, May 29, 2024 at 01:12:17PM +0100, Pierre-Clément Tosi wrote: > > > For kCFI, the compiler encodes in the immediate of the BRK (which the > > > CPU places in ESR_ELx) the indices of the two registers it used to hold > > > (resp.) the function pointer and expected type. Therefore, the kCFI > > > handler must be able to parse the contents of the register file at the > > > point where the exception was triggered. > > > > > > To achieve this, introduce a new hypervisor panic path that first stores > > > the CPU context in the per-CPU kvm_hyp_ctxt before calling (directly or > > > indirectly) hyp_panic() and execute it from all EL2 synchronous > > > exception handlers i.e. > > > > > > - call it directly in host_el2_sync_vect (__kvm_hyp_host_vector, EL2t&h) > > > - call it directly in el2t_sync_invalid (__kvm_hyp_vector, EL2t) > > > - set ELR_EL2 to it in el2_sync (__kvm_hyp_vector, EL2h), which ERETs > > > > > > Teach hyp_panic() to decode the kCFI ESR and extract the target and type > > > from the saved CPU context. In VHE, use that information to panic() with > > > a specialized error message. In nVHE, only report it if the host (EL1) > > > has access to the saved CPU context i.e. iff CONFIG_NVHE_EL2_DEBUG=y, > > > which aligns with the behavior of CONFIG_PROTECTED_NVHE_STACKTRACE. > > > > > > Signed-off-by: Pierre-Clément Tosi <ptosi@xxxxxxxxxx> > > > --- > > > arch/arm64/kvm/handle_exit.c | 30 +++++++++++++++++++++++-- > > > arch/arm64/kvm/hyp/entry.S | 24 +++++++++++++++++++- > > > arch/arm64/kvm/hyp/hyp-entry.S | 2 +- > > > arch/arm64/kvm/hyp/include/hyp/switch.h | 4 ++-- > > > arch/arm64/kvm/hyp/nvhe/host.S | 2 +- > > > arch/arm64/kvm/hyp/vhe/switch.c | 26 +++++++++++++++++++-- > > > 6 files changed, 79 insertions(+), 9 deletions(-) > > > > This quite a lot of work just to print out some opaque type numbers > > when CONFIG_NVHE_EL2_DEBUG=y. Is it really worth it? How would I use > > this information to debug an otherwise undebuggable kcfi failure at EL2? > > The type ID alone might not be worth it but what about the target? > > In my experience, non-malicious kCFI failures are often caused by an issue with > the callee, not the caller. Without this patch, only the source of the exception > is reported but, with it, the panic handler also prints the kCFI target (i.e. > value of the function pointer) as a symbol. I think it's less of an issue for EL2, as we don't have tonnes of indirections, but I agree that the target is nice to have. > With the infrastructure for the target in place, it isn't much more work to also > report the type ID. Although it is rarely informative (as you noted), there are > some situations where it can still be useful e.g. if reported as zero and/or has > been corrupted and does not match its value from the ELF. So looking at the implementation, I'm not a huge fan of saving off all the GPRs and then relying on the stage-2 being disabled so that the host can fish out the registers it cares about. I think I'd prefer to provide the target as an additional argument to nvhe_hyp_panic_handler(), meaning that we could even print the VA when CONFIG_NVHE_EL2_DEBUG is disabled. But for now, I suggest we drop this patch along with the testing patches because I think the rest of the series is nearly there and it's a useful change on its own. Will