On Thu, Sep 13, 2018 at 8:09 PM, 'Nick Desaulniers' via kasan-dev <kasan-dev@xxxxxxxxxxxxxxxx> wrote: > On Thu, Sep 13, 2018 at 1:37 AM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote: >> >> On Wed, Sep 12, 2018 at 7:39 PM, Jann Horn <jannh@xxxxxxxxxx> wrote: >> > On Wed, Sep 12, 2018 at 7:16 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote: >> >> On Wed, Aug 29, 2018 at 1:35 PM, Andrey Konovalov <andreyknvl@xxxxxxxxxx> wrote: >> > [...] >> >> > +static int khwasan_handler(struct pt_regs *regs, unsigned int esr) >> >> > +{ >> >> > + bool recover = esr & KHWASAN_ESR_RECOVER; >> >> > + bool write = esr & KHWASAN_ESR_WRITE; >> >> > + size_t size = KHWASAN_ESR_SIZE(esr); >> >> > + u64 addr = regs->regs[0]; >> >> > + u64 pc = regs->pc; >> >> > + >> >> > + if (user_mode(regs)) >> >> > + return DBG_HOOK_ERROR; >> >> > + >> >> > + kasan_report(addr, size, write, pc); >> >> > + >> >> > + /* >> >> > + * The instrumentation allows to control whether we can proceed after >> >> > + * a crash was detected. This is done by passing the -recover flag to >> >> > + * the compiler. Disabling recovery allows to generate more compact >> >> > + * code. >> >> > + * >> >> > + * Unfortunately disabling recovery doesn't work for the kernel right >> >> > + * now. KHWASAN reporting is disabled in some contexts (for example when >> >> > + * the allocator accesses slab object metadata; same is true for KASAN; >> >> > + * this is controlled by current->kasan_depth). All these accesses are >> >> > + * detected by the tool, even though the reports for them are not >> >> > + * printed. >> >> > + * >> >> > + * This is something that might be fixed at some point in the future. >> >> > + */ >> >> > + if (!recover) >> >> > + die("Oops - KHWASAN", regs, 0); >> >> >> >> Why die and not panic? Die seems to be much less used function, and it >> >> calls panic anyway, and we call panic in kasan_report if panic_on_warn >> >> is set. >> > >> > die() is vaguely equivalent to BUG(); die() and BUG() normally only >> > terminate the current process, which may or may not leave the system >> > somewhat usable, while panic() always brings down the whole system. >> > AFAIK panic() shouldn't be used unless you're in some very low-level >> > code where you know that trying to just kill the current process can't >> > work and the entire system is broken beyond repair. >> > >> > If KASAN traps on some random memory access, there's a good chance >> > that just killing the current process will allow at least parts of the >> > system to continue. I'm not sure whether BUG() or die() is more >> > appropriate here, but I think it definitely should not be a panic(). >> >> >> Nick, do you know if die() will be enough to catch problems on Android >> phones? panic_on_warn would turn this into panic, but I guess one does >> not want panic_on_warn on a canary phone. > > die() has arch specific implementations, so looking at: > > arch/arm64/kernel/traps.c:196#die > > it looks like panic is invoked if in_interrupt() or panic_on_oops(), > which is a configure option. So maybe the config for KHWASAN should > also enable that? Otherwise seems easy to forget. But maybe that > should remain configurable separately? > > Looking at the kernel configs for the Pixel 2, it does seem like > CONFIG_PANIC_ON_OOPS=y is already enabled. > https://android.googlesource.com/kernel/msm/+/android-msm-wahoo-4.4-pie/arch/arm64/configs/wahoo_defconfig#746 Then I think we are good here. > Specifically to catch problems on Android, our internal debug builds > can report on panics, but not oops, IIUC.