On Thu, Nov 9, 2017 at 11:51 AM, Patrick McLean <chutzpah@xxxxxxxxxx> wrote: > > We do have CONFIG_GCC_PLUGIN_STRUCTLEAK and > CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL enabled on these boxes as well as > CONFIG_GCC_PLUGIN_RANDSTRUCT as you pointed out before. It might be worth just verifying without RANDSTRUCT in particular. That case has probably not gotten a huge amount of testing. As Al points out, it can cause absolutely horrendous cache access pattern changes, but it might also be triggering some corruption in case there's a problem with the plugin, or with some piece of kernel code that gets confused by it. And most obviously: if there is some module or part of the kernel that got compiled with a different seed for the randstruct hashing, that will break in nasty nasty ways. Your out-of-kernel module is the obvious suspect for something like that, but honestly, it could be some missing build dependency, or simply a missing special case in the plugin itself a missing __no_randomize_layout or any number of things. We've hit gcc bugs many times before - and the plugins are just new opportunities to hit cases that have gotten a lot less testing than the "normal" code flow has. The structleak plugin is much less likely to be a problem (simply because it's a much simpler plugin), but hey, something being NULL when it shouldn't possibly be might be a stray "leak initialization". So since you seem to be able to reproduce this _reasonably_ easily, it's definitely worth checking that it still reproduces even without the gcc plugins. Just to narrow it down a bit. Linus