On Wed, Feb 11, 2015 at 2:28 PM, Pali Rohár <pali.rohar@xxxxxxxxx> wrote: > On Wednesday 11 February 2015 16:22:51 Matthijs van Duin wrote: >> On 11 February 2015 at 13:39, Pali Rohár <pali.rohar@xxxxxxxxx> > wrote: >> >> Anyhow, since checking the firewalls/APs to see if you have >> >> permission will probably only get you yet another fault if >> >> things are walled off, the robust way of dealing with this >> >> sort of situation is by probing the device with a read >> >> while trapping bus faults. This also handles modules that >> >> are unreachable for other reasons, e.g. being disabled by >> >> eFuse. >> > >> > It is possible to patch kernel code to mask or ignore that >> > fault? Can you help me with something like that? >> >> As I mentioned, I'm still learning my way around the kernel, >> so I don't feel very comfortable suggesting a concrete patch >> just yet. I've been browsing arch/arm/mm/ however and my >> impression is that all that would be required is editing >> fault.c by making a copy of do_bad but containing >> return user_mode(regs) || !fixup_exception(regs); >> and hook it onto the appropriate fault codes. However, this >> really needs the opinion of someone more familiar with this >> code. >> >> I do have an observation to make on the issue of fault >> decoding: the list in fsr-2level.c may be "standard ARMv3 and >> ARMv4 aborts" but they are quite wrong for ARMv7 which has: >> >> [ 0] - >> [ 1] alignment fault >> [ 2] debug event >> [ 3] section access flag fault >> [ 4] instruction cache maintainance fault (reported via data >> abort) [ 5] section translation fault >> [ 6] page access flag fault >> [ 7] page translation fault >> [ 8] bus error on access >> [ 9] section domain fault >> [10] - >> [11] page domain fault >> [12] bus error on section table walk >> [13] section permission fault >> [14] bus error on page table walk >> [15] page permission fault >> [16] (TLB conflict abort) >> [17] - >> [18] - >> [19] - >> [20] (lockdown abort) >> [21] - >> [22] async bus error (reported via data abort) >> [23] - >> [24] async parity/ECC error (reported via data abort) >> [25] parity/ECC error on access >> [26] (coprocessor abort) >> [27] - >> [28] parity/ECC error on section table walk >> [29] - >> [30] parity/ECC error on page table walk >> [31] - >> >> Some entries are patched up near the bottom of fault.c but >> many bogus messages remain, for example the "on linefetch" vs >> "on non-linefetch" is misleading since no such thing can be >> inferred from the fault status on v7. Also, the i-cache >> maintenance fault handling looks wrong to me: it should fetch >> the actual fault status from IFSR (even though the address >> still comes from DFSR) and dispatch based on that. >> >> Async external aborts (async bus error and async parity/ECC >> error) give you basically no info. DFAR will contain garbage >> hence displaying it will confuse rather than enlighten, a >> traceback is pointless since the instruction that caused the >> access is long retired, likewise user_mode() doesn't matter >> since a transition to kernel space may have happened after >> the access that cause the abort. Basically they should be >> treated more as an IRQ than as a fault (note they can also be >> masked just like irqs). In case of a bus error, it may be >> appropriate to just warn about it, or perhaps send a signal >> to the current process, although in the latter case it should >> have some means to distinguish it from a synchronous bus >> error. >> >> At least on the cortex-a8, a parity/ECC error (whether async >> or not) is to be regarded as absolutely fatal. Quoth the >> TRM: "No recovery is possible. The abort handler must disable >> the caches, communicate the fail directly with the external >> system, request a reboot." >> >> Bit 10 no longer indicates an asynchronous (let alone >> imprecise) fault. Apart from the debug events and async >> aborts (and possibly some implementation-defined aborts), all >> aborts listed are synchronous, and DFAR/IFAR is valid. >> There's no technical obstruction to make these trappable via >> the kernel exception handling mechanism. (Though at least in >> case of parity/ECC errors one shouldn't.) > > Tony, Nishanth, or somebody else... can you help with memory > management? Or do you know some expert for arch/arm/mm/ code? Folks in linux-arm-kernel are probably the right people, I suppose. Looping them in. -- --- Regards, Nishanth Menon -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html