On Fri, Aug 12, 2022 at 11:29:44AM +0200, Dmitry Vyukov wrote: > On Fri, 12 Aug 2022 at 02:11, Ira Weiny <ira.weiny@xxxxxxxxx> wrote: > > > > On Thu, Aug 11, 2022 at 02:00:59PM -0700, Kees Cook wrote: > > > On Thu, Aug 11, 2022 at 11:51:34AM -0700, Ira Weiny wrote: > > > > On Thu, Aug 11, 2022 at 10:39:29AM -0700, Ira wrote: > > > > > On Thu, Aug 11, 2022 at 08:33:16AM -0700, Kees Cook wrote: > > > > > > Hi Fabio, > > > > > > > > > > > > It seems likely that the kmap change[1] might be causing this crash. Is > > > > > > there a boot-time setup race between kmap being available and early umh > > > > > > usage? > > > > > > > > > > I don't see how this is a setup problem with the config reported here. > > > > > > > > > > CONFIG_64BIT=y > > > > > > > > > > ...and HIGHMEM is not set. > > > > > ...and PREEMPT_RT is not set. > > > > > > > > > > So the kmap_local_page() call in that stack should be a page_address() only. > > > > > > > > > > I think the issue must be some sort of race which was being prevented because > > > > > of the preemption and/or pagefault disable built into kmap_atomic(). > > > > > > > > > > Is this reproducable? > > > > > > > > > > The hunk below will surely fix it but I think the pagefault_disable() is > > > > > the only thing that is required. It would be nice to test it. > > > > > > > > Fabio and I discussed this. And he also mentioned that pagefault_disable() is > > > > all that is required. > > > > > > Okay, sounds good. > > > > > > > Do we have a way to test this? > > > > > > It doesn't look like syzbot has a reproducer yet, so its patch testing > > > system[1] will not work. But if you can send me a patch, I could land it > > > in -next and we could see if the reproduction frequency drops to zero. > > > (Looking at the dashboard, it's seen 2 crashes, most recently 8 hours > > > ago.) > > > > Patch sent. > > > > https://lore.kernel.org/lkml/20220812000919.408614-1-ira.weiny@xxxxxxxxx/ Thank you! > > > > But I'm more confused after looking at this again. > > There is splat of random crashes in linux-next happened at the same time: > > https://groups.google.com/g/syzkaller-bugs/search?q=%22linux-next%20boot%20error%3A%22 > > There are 10 different crashes in completely random places. > I would assume they have the same root cause, some silent memory > corruption or something similar. Yeah, I noticed the crashes stopped "on their own", so I think I'll wait a bit more, and if it start back up, we can try Ira's patch, though I'd agree with the assessment that it looks like it shouldn't be needed. -Kees -- Kees Cook