Re: WARNING at asm/kfence.h:44 kfence_protect_page

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Mon, 14 Jun 2021 15:29:22 -0600

On Mon, Jun 14, 2021 at 2:43 PM Filipe Manana <fdmanana@xxxxxxxxx> wrote:
>
> On Mon, Jun 14, 2021 at 9:07 PM Chris Murphy <lists@xxxxxxxxxxxxxxxxx> wrote:
> > No, but Fedora debug kernels do have it enabled so I can ask the
> > reporter to use 5.12.10-debug or 5.13-rc6. Any preference?
>
> No preference.
> It's a problem that has been around for several years now, and not
> something recent.
>
> Ok, so it was reported before. Anywhere public, or was it something private?

https://bugzilla.redhat.com/show_bug.cgi?id=1971327

>
> >
> >
> > > That would trigger an assertion right away:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/btrfs/transaction.c?h=v5.12.10#n584
> > >
> > > With assertions disabled, we just cast current->journal_info into a
> > > transaction handle and dereference it, which is obviously wrong since
> > > ->jornal_info has a value that is not a pointer to a transaction
> > > handle, resulting in the crash.
> > >
> > > How easy is it to reproduce for you?
> >
> > I'm not sure.
> >
> > Do you think following this splat we could have somehow been IO
> > limited or stalled without any other kernel messages appearing? That
> > would then cause reclaim activity, and high memory and swap usage? I'm
> > wondering if it's likely or unlikely, because at the moment I think
> > it's unlikely since /var/log/journal was successfully recording
> > systemd-journald journals for the 81 minutes between this call trace
> > and when systemd-oomd started killing things. I'm trying to separate
> > out what's causing what.
>
> Don't know about that. And honestly I don't think that it matters.
> The send task writes to a pipe and writing to a pipe allocates pages
> without GFP_NOFS (with GFP_HIGHUSER | __GFP_ACCOUNT to be precise),
> meaning we can end up evicting an inode, which in turn needs to start
> a transaction.
>
> That patch would be my preferred way to fix it.
> If it doesn't work for any reason, then we can simply set up a NOFS
> context surrounding the calls in send to write to the pipe, like this:
>
> https://pastebin.com/raw/tTrfAw0m
> (also attached)
>
> I would have to test the first patch in a scenario under heavy memory
> pressure to see if there isn't anything I might have missed, but I
> think it should work just fine.
> I'll test it tomorrow too.
>

The system was under heavy swap and memory pressure, but the logs
don't tell us the contributors or their relative amounts. But logs
show that Bees continued to make progress for almost an hour and a
half after the splat, up until the wayland+shell service was killed by
oomd. I'd expect any blocked tasks to result in warnings about it
every 2 minutes in the journal.

Swap being on zram during the memory pressure period might be a
complicating factor, combining IO, CPU and memory pressure.

-- 
Chris Murphy