Hi Tyler, On 18/01/17 23:26, Baicar, Tyler wrote: > On 1/17/2017 3:31 AM, James Morse wrote: >> On 12/01/17 18:15, Tyler Baicar wrote: >>> SEA exceptions are often caused by an uncorrected hardware >>> error, and are handled when data abort and instruction abort >>> exception classes have specific values for their Fault Status >>> Code. >>> When SEA occurs, before killing the process, go through >>> the handlers registered in the notification list. >>> Update fault_info[] with specific SEA faults so that the >>> new SEA handler is used. >>> @@ -480,6 +496,28 @@ static int do_bad(unsigned long addr, unsigned int esr, >>> struct pt_regs *regs) >>> return 1; >>> } >>> +/* >>> + * This abort handler deals with Synchronous External Abort. >>> + * It calls notifiers, and then returns "fault". >>> + */ >>> +static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) >>> +{ >>> + struct siginfo info; >>> + >>> + atomic_notifier_call_chain(&sea_handler_chain, 0, NULL); >>> + >>> + pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n", >>> + fault_name(esr), esr, addr); >>> + >>> + info.si_signo = SIGBUS; >>> + info.si_errno = 0; >>> + info.si_code = 0; >> Half of the other do_*() functions in this file read the signo and code from the >> fault_info table. >> >> >>> + info.si_addr = (void __user *)addr; >> addr here was read from FAR_EL1, but for some of the classes of exception you >> have listed below this register isn't updated with the faulting address. >> >> The ARM-ARM version 'k' in D1.10.5 "Summary of registers on faults taken to an >> Exception level that is using Aarch64" has: >>> The architecture permits that the FAR_ELx is UNKNOWN for Synchronous External >>> Aborts other than Synchronous External Aborts on Translation Table Walks. In >>> this case, the ISS.FnV bit returned in ESR_ELx indicates whether FAR_ELx is >>> valid. >> This is a problem if we get 'synchronous external abort' or 'synchronous parity >> error' while a user space process was running. > It looks like this would just cause an incorrect address to be printed in the > above pr_err. > Unless I'm missing something, I don't see arm64_notify_die or anything that gets > called from > there using the info.si_addr variable. I may be misreading something here... This patch has: > info.si_addr = (void __user *)addr; > arm64_notify_die("", regs, &info, esr); >From arch/arm64/kernel/traps.c:arm64_notify_die(): > if (user_mode(regs)) { > current->thread.fault_address = 0; > current->thread.fault_code = err; > force_sig_info(info->si_signo, info, current); > } So if the SEA interrupted userspace, we put maybe-unknown addr into force_sig_info() to deliver a signal to user space. User-space then gets a copy of the info struct containing the maybe-unknown addr. I think this is an existing bug, but if we are separating the synchronous external aborts from the generic do_bad handler, we should probably check the FnV bit. (I think we should still print it out) > What do you suggest I do here? The firmware should be reporting the physical and > virtual > address information if it is available in the HEST entry that the kernel will > parse. Its not just firmware that may trigger this, other SoCs may use it for parity or ECC errors, and they may not always have a valid address in FAR_EL1. I think we should check the FnV bit in the esr variable and set info.si_addr to 0 if the addr we have isn't valid: 'For some implementations, the value of si_addr may be inaccurate.' [0] Thanks, James [0] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm