On Fri, Aug 30, 2019 at 04:02:48PM -0500, Eric W. Biederman wrote: > Russell King - ARM Linux admin <linux@xxxxxxxxxxxxxxx> writes: > > > On Fri, Aug 30, 2019 at 02:45:36PM -0500, Eric W. Biederman wrote: > >> Russell King - ARM Linux admin <linux@xxxxxxxxxxxxxxx> writes: > >> > >> > On Fri, Aug 30, 2019 at 09:31:17PM +0800, Jing Xiangfeng wrote: > >> >> The function do_alignment can handle misaligned address for user and > >> >> kernel space. If it is a userspace access, do_alignment may fail on > >> >> a low-memory situation, because page faults are disabled in > >> >> probe_kernel_address. > >> >> > >> >> Fix this by using __copy_from_user stead of probe_kernel_address. > >> >> > >> >> Fixes: b255188 ("ARM: fix scheduling while atomic warning in alignment handling code") > >> >> Signed-off-by: Jing Xiangfeng <jingxiangfeng@xxxxxxxxxx> > >> > > >> > NAK. > >> > > >> > The "scheduling while atomic warning in alignment handling code" is > >> > caused by fixing up the page fault while trying to handle the > >> > mis-alignment fault generated from an instruction in atomic context. > >> > > >> > Your patch re-introduces that bug. > >> > >> And the patch that fixed scheduling while atomic apparently introduced a > >> regression. Admittedly a regression that took 6 years to track down but > >> still. > > > > Right, and given the number of years, we are trading one regression for > > a different regression. If we revert to the original code where we > > fix up, we will end up with people complaining about a "new" regression > > caused by reverting the previous fix. Follow this policy and we just > > end up constantly reverting the previous revert. > > > > The window is very small - the page in question will have had to have > > instructions read from it immediately prior to the handler being entered, > > and would have had to be made "old" before subsequently being unmapped. > > > Rather than excessively complicating the code and making it even more > > inefficient (as in your patch), we could instead retry executing the > > instruction when we discover that the page is unavailable, which should > > cause the page to be paged back in. > > My patch does not introduce any inefficiencies. It onlys moves the > check for user_mode up a bit. My patch did duplicate the code. > > > If the page really is unavailable, the prefetch abort should cause a > > SEGV to be raised, otherwise the re-execution should replace the page. > > > > The danger to that approach is we page it back in, and it gets paged > > back out before we're able to read the instruction indefinitely. > > I would think either a little code duplication or a function that looks > at user_mode(regs) and picks the appropriate kind of copy to do would be > the best way to go. Because what needs to happen in the two cases for > reading the instruction are almost completely different. That is what I mean. I'd prefer to avoid that with the large chunk of code. How about instead adding a local replacement for probe_kernel_address() that just sorts out the reading, rather than duplicating all the code to deal with thumb fixup. > > However, as it's impossible for me to contact the submitter, anything > > I do will be poking about in the dark and without any way to validate > > that it does fix the problem, so I think apart from reviewing of any > > patches, there's not much I can do. > > I didn't realize your emails to him were bouncing. That is odd. Mine > don't appear to be. Hmm, so the fact I posted publically in reply to my reply with the MTA bounce message didn't give you a clue? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up