Re: [PATCH v2] vmalloc: Fix issues with flush flag

"Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> · Fri, 24 May 2019 15:50:48 +0000

On Wed, 2019-05-22 at 15:40 -0700, Rick Edgecombe wrote:
> On Wed, 2019-05-22 at 12:26 -0700, Rick Edgecombe wrote:
> > On Wed, 2019-05-22 at 10:40 -0700, David Miller wrote:
> > > From: "Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx>
> > > Date: Tue, 21 May 2019 01:59:54 +0000
> > > 
> > > > On Mon, 2019-05-20 at 18:43 -0700, David Miller wrote:
> > > > > From: "Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx>
> > > > > Date: Tue, 21 May 2019 01:20:33 +0000
> > > > > 
> > > > > > Should it handle executing an unmapped page gracefully?
> > > > > > Because
> > > > > > this
> > > > > > change is causing that to happen much earlier. If something
> > > > > > was
> > > > > > relying
> > > > > > on a cached translation to execute something it could find
> > > > > > the
> > > > > > mapping
> > > > > > disappear.
> > > > > 
> > > > > Does this work by not mapping any kernel mappings at the
> > > > > beginning,
> > > > > and then filling in the BPF mappings in response to faults?
> > > > No, nothing too fancy. It just flushes the vm mapping
> > > > immediatly
> > > > in
> > > > vfree for execute (and RO) mappings. The only thing that
> > > > happens
> > > > around
> > > > allocation time is setting of a new flag to tell vmalloc to do
> > > > the
> > > > flush.
> > > > 
> > > > The problem before was that the pages would be freed before the
> > > > execute
> > > > mapping was flushed. So then when the pages got recycled,
> > > > random,
> > > > sometimes coming from userspace, data would be mapped as
> > > > executable
> > > > in
> > > > the kernel by the un-flushed tlb entries.
> > > 
> > > If I am to understand things correctly, there was a case where
> > > 'end'
> > > could be smaller than 'start' when doing a range flush.  That
> > > would
> > > definitely kill some of the sparc64 TLB flush routines.
> > 
> > Ok, thanks.
> > 
> > The patch at the beginning of this thread doesn't have that
> > behavior
> > though and it apparently still hung. I asked if Meelis could test
> > with
> > this feature disabled and DEBUG_PAGEALLOC on, since it flushes on
> > every
> > vfree and is not new logic, and also with a patch that logs exact
> > TLB
> > flush ranges and fault addresses on top of the kernel having this
> > issue. Hopefully that will shed some light.
> > 
> > Sorry for all the noise and speculation on this. It has been
> > difficult
> > to debug remotely with a tester and developer in different time
> > zones.
> > 
> > 
> Ok, so with a patch to disable setting the new vmalloc flush flag on
> architectures that have normal memory as executable (includes sparc),
> boot succeeds.
> 
> With this disable patch and DEBUG_PAGEALLOC on, it hangs earlier than
> before. Going from clues in other logs, it looks like it hangs right
> at
> the first normal vfree.
> 
> Thanks for all the testing Meelis!
> 
> So it seems like other, not new, TLB flushes also trigger the hang.
> 
> From earlier logs provided, this vfree would be the first call to
> flush_tlb_kernel_range(), and before any BPF allocations appear in
> the
> logs. So I am suspecting some other cause than the bisected patch at
> this point, but I guess it's not fully conclusive.
> 
> It could be informative to bisect upstream again with the
> DEBUG_PAGEALLOC configs on, to see if it indeed points to an earlier
> commit.

So now Meelis has found that the commit before any of my vmalloc
changes also hangs during boot with DEBUG_PAGEALLOC on. It does this
shortly after the first vfree, which DEBUG_PAGEALLOC would of course
make trigger a flush_tlb_kernel_range() on the allocation just like my
vmalloc changes do on certain vmallocs. The upstream code calls
vm_unmap_aliases() instead of the flush_tlb_kernel_range() directly,
but we also tested a version that called the flush directly on just the
allocation and it also hung. So it seems like issues flushing vmallocs
on this platform exist outside my commits.

How do people feel about calling this a sparc specific issue uncovered
by my patch instead of caused by it at this point?

If people agree with this assesment, it of course still seems like the
new changes turn the root cause into a more impactful issue for this
specific combination. On the other hand I am not the right person to
fix the root cause for several reasons including no hardware access. 

Otherwise I could submit a patch to disable this for sparc since it
doesn't really get a security benefit from it anyway. What do people
think?