On Tue, 19 Nov 2019, Marco Elver wrote: > On Tue, 19 Nov 2019 at 21:13, Qian Cai <cai@xxxxxx> wrote: > > > > On Thu, 2019-11-14 at 19:02 +0100, 'Marco Elver' via kasan-dev wrote: > > > This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN). > > > KCSAN is a sampling watchpoint-based *data race detector*. More details > > > are included in **Documentation/dev-tools/kcsan.rst**. This patch-series > > > only enables KCSAN for x86, but we expect adding support for other > > > architectures is relatively straightforward (we are aware of > > > experimental ARM64 and POWER support). > > > > This does not allow the system to boot. Just hang forever at the end. > > > > https://cailca.github.io/files/dmesg.txt > > > > the config (dselect KASAN and select KCSAN with default options): > > > > https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > > Thanks! That config enables lots of other debug code. I could > reproduce the hang. It's related to CONFIG_PROVE_LOCKING etc. > > The problem is definitely not the fact that kcsan_setup_watchpoint > disables interrupts (tested by removing that code). Although lockdep > still complains here, and looking at the code in kcsan/core.c, I just > can't see how local_irq_restore cannot be called before returning (in > the stacktrace you provided, there is no kcsan function), and > interrupts should always be re-enabled. (Interrupts are only disabled > during delay in kcsan_setup_watchpoint.) > > What I also notice is that this happens when the console starts > getting spammed with data-race reports (presumably because some extra > debug code has lots of data races according to KCSAN). > > My guess is that some of the extra debug logic enabled in that config > is incompatible with KCSAN. However, so far I cannot tell where > exactly the problem is. For now the work-around would be not using > KCSAN with these extra debug options. I will investigate more, but > nothing obviously wrong stands out.. It seems that due to spinlock_debug.c containing data races, the console gets spammed with reports. However, it's also possible to encounter deadlock, e.g. printk lock -> spinlock_debug -> KCSAN detects data race -> kcsan_print_report() -> printk lock -> deadlock. So the best thing is to fix the data races in spinlock_debug. I will send a patch separately for you to test. The issue that lockdep still reports inconsistency in IRQ flags tracing I cannot yet say what the problem is. It seems that lockdep IRQ flags tracing may have an issue with KCSAN for numerous reasons: let's say lockdep and IRQ flags tracing code is instrumented, which then calls into KCSAN, which disables/enables interrupts, but due to tracing calls back into lockdep code. In other words, there may be some recursion which corrupts hardirqs_enabled. Thanks, -- Marco