Re: [PATCH] arm: fix page faults in do_alignment

Russell King - ARM Linux admin <linux@xxxxxxxxxxxxxxx> · Fri, 30 Aug 2019 23:29:06 +0100

On Fri, Aug 30, 2019 at 04:02:48PM -0500, Eric W. Biederman wrote:
> Russell King - ARM Linux admin <linux@xxxxxxxxxxxxxxx> writes:
> 
> > On Fri, Aug 30, 2019 at 02:45:36PM -0500, Eric W. Biederman wrote:
> >> Russell King - ARM Linux admin <linux@xxxxxxxxxxxxxxx> writes:
> >> 
> >> > On Fri, Aug 30, 2019 at 09:31:17PM +0800, Jing Xiangfeng wrote:
> >> >> The function do_alignment can handle misaligned address for user and
> >> >> kernel space. If it is a userspace access, do_alignment may fail on
> >> >> a low-memory situation, because page faults are disabled in
> >> >> probe_kernel_address.
> >> >> 
> >> >> Fix this by using __copy_from_user stead of probe_kernel_address.
> >> >> 
> >> >> Fixes: b255188 ("ARM: fix scheduling while atomic warning in alignment handling code")
> >> >> Signed-off-by: Jing Xiangfeng <jingxiangfeng@xxxxxxxxxx>
> >> >
> >> > NAK.
> >> >
> >> > The "scheduling while atomic warning in alignment handling code" is
> >> > caused by fixing up the page fault while trying to handle the
> >> > mis-alignment fault generated from an instruction in atomic context.
> >> >
> >> > Your patch re-introduces that bug.
> >> 
> >> And the patch that fixed scheduling while atomic apparently introduced a
> >> regression.  Admittedly a regression that took 6 years to track down but
> >> still.
> >
> > Right, and given the number of years, we are trading one regression for
> > a different regression.  If we revert to the original code where we
> > fix up, we will end up with people complaining about a "new" regression
> > caused by reverting the previous fix.  Follow this policy and we just
> > end up constantly reverting the previous revert.
> >
> > The window is very small - the page in question will have had to have
> > instructions read from it immediately prior to the handler being entered,
> > and would have had to be made "old" before subsequently being unmapped.
> 
> > Rather than excessively complicating the code and making it even more
> > inefficient (as in your patch), we could instead retry executing the
> > instruction when we discover that the page is unavailable, which should
> > cause the page to be paged back in.
> 
> My patch does not introduce any inefficiencies.  It onlys moves the
> check for user_mode up a bit.  My patch did duplicate the code.
> 
> > If the page really is unavailable, the prefetch abort should cause a
> > SEGV to be raised, otherwise the re-execution should replace the page.
> >
> > The danger to that approach is we page it back in, and it gets paged
> > back out before we're able to read the instruction indefinitely.
> 
> I would think either a little code duplication or a function that looks
> at user_mode(regs) and picks the appropriate kind of copy to do would be
> the best way to go.  Because what needs to happen in the two cases for
> reading the instruction are almost completely different.

That is what I mean.  I'd prefer to avoid that with the large chunk of
code.  How about instead adding a local replacement for
probe_kernel_address() that just sorts out the reading, rather than
duplicating all the code to deal with thumb fixup. 

> > However, as it's impossible for me to contact the submitter, anything
> > I do will be poking about in the dark and without any way to validate
> > that it does fix the problem, so I think apart from reviewing of any
> > patches, there's not much I can do.
> 
> I didn't realize your emails to him were bouncing.  That is odd.  Mine
> don't appear to be.

Hmm, so the fact I posted publically in reply to my reply with the MTA
bounce message didn't give you a clue?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up