Russell King - ARM Linux admin <linux@xxxxxxxxxxxxxxx> writes: > On Fri, Aug 30, 2019 at 04:02:48PM -0500, Eric W. Biederman wrote: >> Russell King - ARM Linux admin <linux@xxxxxxxxxxxxxxx> writes: >> >> > On Fri, Aug 30, 2019 at 02:45:36PM -0500, Eric W. Biederman wrote: >> >> Russell King - ARM Linux admin <linux@xxxxxxxxxxxxxxx> writes: >> >> >> >> > On Fri, Aug 30, 2019 at 09:31:17PM +0800, Jing Xiangfeng wrote: >> >> >> The function do_alignment can handle misaligned address for user and >> >> >> kernel space. If it is a userspace access, do_alignment may fail on >> >> >> a low-memory situation, because page faults are disabled in >> >> >> probe_kernel_address. >> >> >> >> >> >> Fix this by using __copy_from_user stead of probe_kernel_address. >> >> >> >> >> >> Fixes: b255188 ("ARM: fix scheduling while atomic warning in alignment handling code") >> >> >> Signed-off-by: Jing Xiangfeng <jingxiangfeng@xxxxxxxxxx> >> >> > >> >> > NAK. >> >> > >> >> > The "scheduling while atomic warning in alignment handling code" is >> >> > caused by fixing up the page fault while trying to handle the >> >> > mis-alignment fault generated from an instruction in atomic context. >> >> > >> >> > Your patch re-introduces that bug. >> >> >> >> And the patch that fixed scheduling while atomic apparently introduced a >> >> regression. Admittedly a regression that took 6 years to track down but >> >> still. >> > >> > Right, and given the number of years, we are trading one regression for >> > a different regression. If we revert to the original code where we >> > fix up, we will end up with people complaining about a "new" regression >> > caused by reverting the previous fix. Follow this policy and we just >> > end up constantly reverting the previous revert. >> > >> > The window is very small - the page in question will have had to have >> > instructions read from it immediately prior to the handler being entered, >> > and would have had to be made "old" before subsequently being unmapped. >> >> > Rather than excessively complicating the code and making it even more >> > inefficient (as in your patch), we could instead retry executing the >> > instruction when we discover that the page is unavailable, which should >> > cause the page to be paged back in. >> >> My patch does not introduce any inefficiencies. It onlys moves the >> check for user_mode up a bit. My patch did duplicate the code. >> >> > If the page really is unavailable, the prefetch abort should cause a >> > SEGV to be raised, otherwise the re-execution should replace the page. >> > >> > The danger to that approach is we page it back in, and it gets paged >> > back out before we're able to read the instruction indefinitely. >> >> I would think either a little code duplication or a function that looks >> at user_mode(regs) and picks the appropriate kind of copy to do would be >> the best way to go. Because what needs to happen in the two cases for >> reading the instruction are almost completely different. > > That is what I mean. I'd prefer to avoid that with the large chunk of > code. How about instead adding a local replacement for > probe_kernel_address() that just sorts out the reading, rather than > duplicating all the code to deal with thumb fixup. So something like this should be fine? Jing Xiangfeng can you test this please? I think this fixes your issue but I don't currently have an arm development box where I could test this. diff --git a/arch/arm/mm/alignment.c b/arch/arm/mm/alignment.c index 04b36436cbc0..b07d17ca0ae5 100644 --- a/arch/arm/mm/alignment.c +++ b/arch/arm/mm/alignment.c @@ -767,6 +767,23 @@ do_alignment_t32_to_handler(unsigned long *pinstr, struct pt_regs *regs, return NULL; } +static inline unsigned long +copy_instr(bool umode, void *dst, unsigned long instrptr, size_t size) +{ + unsigned long result; + if (umode) { + void __user *src = (void *)instrptr; + result = copy_from_user(dst, src, size); + } else { + void *src = (void *)instrptr; + result = probe_kernel_read(dst, src, size); + } + /* Convert short reads into -EFAULT */ + if ((result >= 0) && (result < size)) + result = -EFAULT; + return result; +} + static int do_alignment(unsigned long addr, unsigned int fsr, struct pt_regs *regs) { @@ -778,22 +795,24 @@ do_alignment(unsigned long addr, unsigned int fsr, struct pt_regs *regs) u16 tinstr = 0; int isize = 4; int thumb2_32b = 0; + bool umode; if (interrupts_enabled(regs)) local_irq_enable(); instrptr = instruction_pointer(regs); + umode = user_mode(regs); if (thumb_mode(regs)) { - u16 *ptr = (u16 *)(instrptr & ~1); - fault = probe_kernel_address(ptr, tinstr); + unsigned long tinstrptr = instrptr & ~1; + fault = copy_instr(umode, &tinstr, tinstrptr, 2); tinstr = __mem_to_opcode_thumb16(tinstr); if (!fault) { if (cpu_architecture() >= CPU_ARCH_ARMv7 && IS_T32(tinstr)) { /* Thumb-2 32-bit */ u16 tinst2 = 0; - fault = probe_kernel_address(ptr + 1, tinst2); + fault = copy_instr(umode, &tinst2, tinstrptr + 2, 2); tinst2 = __mem_to_opcode_thumb16(tinst2); instr = __opcode_thumb32_compose(tinstr, tinst2); thumb2_32b = 1; @@ -803,7 +822,7 @@ do_alignment(unsigned long addr, unsigned int fsr, struct pt_regs *regs) } } } else { - fault = probe_kernel_address((void *)instrptr, instr); + fault = copy_instr(umode, &instr, instrptr, 4); instr = __mem_to_opcode_arm(instr); } @@ -812,7 +831,7 @@ do_alignment(unsigned long addr, unsigned int fsr, struct pt_regs *regs) goto bad_or_fault; } - if (user_mode(regs)) + if (umode) goto user; ai_sys += 1;