On 11 February 2015 at 13:39, Pali Rohár <pali.rohar@xxxxxxxxx> wrote: >> Anyhow, since checking the firewalls/APs to see if you have >> permission will probably only get you yet another fault if >> things are walled off, the robust way of dealing with this >> sort of situation is by probing the device with a read while >> trapping bus faults. This also handles modules that are >> unreachable for other reasons, e.g. being disabled by eFuse. > > It is possible to patch kernel code to mask or ignore that fault? > Can you help me with something like that? As I mentioned, I'm still learning my way around the kernel, so I don't feel very comfortable suggesting a concrete patch just yet. I've been browsing arch/arm/mm/ however and my impression is that all that would be required is editing fault.c by making a copy of do_bad but containing return user_mode(regs) || !fixup_exception(regs); and hook it onto the appropriate fault codes. However, this really needs the opinion of someone more familiar with this code. I do have an observation to make on the issue of fault decoding: the list in fsr-2level.c may be "standard ARMv3 and ARMv4 aborts" but they are quite wrong for ARMv7 which has: [ 0] - [ 1] alignment fault [ 2] debug event [ 3] section access flag fault [ 4] instruction cache maintainance fault (reported via data abort) [ 5] section translation fault [ 6] page access flag fault [ 7] page translation fault [ 8] bus error on access [ 9] section domain fault [10] - [11] page domain fault [12] bus error on section table walk [13] section permission fault [14] bus error on page table walk [15] page permission fault [16] (TLB conflict abort) [17] - [18] - [19] - [20] (lockdown abort) [21] - [22] async bus error (reported via data abort) [23] - [24] async parity/ECC error (reported via data abort) [25] parity/ECC error on access [26] (coprocessor abort) [27] - [28] parity/ECC error on section table walk [29] - [30] parity/ECC error on page table walk [31] - Some entries are patched up near the bottom of fault.c but many bogus messages remain, for example the "on linefetch" vs "on non-linefetch" is misleading since no such thing can be inferred from the fault status on v7. Also, the i-cache maintenance fault handling looks wrong to me: it should fetch the actual fault status from IFSR (even though the address still comes from DFSR) and dispatch based on that. Async external aborts (async bus error and async parity/ECC error) give you basically no info. DFAR will contain garbage hence displaying it will confuse rather than enlighten, a traceback is pointless since the instruction that caused the access is long retired, likewise user_mode() doesn't matter since a transition to kernel space may have happened after the access that cause the abort. Basically they should be treated more as an IRQ than as a fault (note they can also be masked just like irqs). In case of a bus error, it may be appropriate to just warn about it, or perhaps send a signal to the current process, although in the latter case it should have some means to distinguish it from a synchronous bus error. At least on the cortex-a8, a parity/ECC error (whether async or not) is to be regarded as absolutely fatal. Quoth the TRM: "No recovery is possible. The abort handler must disable the caches, communicate the fail directly with the external system, request a reboot." Bit 10 no longer indicates an asynchronous (let alone imprecise) fault. Apart from the debug events and async aborts (and possibly some implementation-defined aborts), all aborts listed are synchronous, and DFAR/IFAR is valid. There's no technical obstruction to make these trappable via the kernel exception handling mechanism. (Though at least in case of parity/ECC errors one shouldn't.) -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html