----- On Dec 21, 2015, at 3:03 AM, James Bottomley James.Bottomley@xxxxxxxxxxxxxxxxxxxxx wrote: > On Sun, 2015-12-20 at 21:35 +0100, Helge Deller wrote: >> On 20.12.2015 17:50, James Bottomley wrote: >> > On Sun, 2015-12-20 at 16:49 +0100, Helge Deller wrote: >> > > On 20.12.2015 15:09, Mathieu Desnoyers wrote: >> > > > ----- On Dec 20, 2015, at 8:59 AM, Mathieu Desnoyers >> > > > mathieu.desnoyers@xxxxxxxxxxxx wrote: >> > > > >> > > > > ----- On Dec 18, 2015, at 6:30 PM, Helge Deller deller@xxxxxx >> > > > > wro >> > > > > te: >> > > > > >> > > > > > On parisc syscalls which are interrupted by signals >> > > > > > sometimes >> > > > > > fail to restart >> > > > > > and instead return -ENOSYS which then in the worst case >> > > > > > lead to >> > > > > > userspace >> > > > > > crashes. >> > > > > > A similiar problem existed on MIPS and was fixed by commit >> > > > > > e967ef02 >> > > > > > ("MIPS: Fix restart of indirect syscalls"). >> > > > > > >> > > > > > On parisc the current syscall restart code assumes hat the >> > > > > > syscall number is >> > > > > > always loaded in the delay branch of the ble instruction as >> > > > > > defined in the >> > > > > > unistd.h header file and as such never restored %r20 before >> > > > > > returning to >> > > > > > userspace: >> > > > > > ble 0x100(%sr2, %r0) >> > > > > > ldi #syscall_nr, %r20 >> > > > > > >> > > > > > This assumption is at least not true for code which uses >> > > > > > the >> > > > > > syscall() glibc >> > > > > > function, which instead uses this syntax: >> > > > > > ble 0x100(%sr2, %r0) >> > > > > > copy regX, %r20 >> > > > > > where regX depend on how the compiler optimizes the code >> > > > > > and >> > > > > > register usage. >> > > > > > >> > > > > > This patch fixes this problem by adding code to analyze how >> > > > > > the >> > > > > > syscall number >> > > > > > is loaded in the delay branch and - if needed - copy the >> > > > > > syscall number to regX >> > > > > > prior returning to userspace for the syscall restart. >> > > > > > >> > > > > > Signed-off-by: Helge Deller <deller@xxxxxx> >> > > > > > Cc: stable@xxxxxxxxxxxxxxx >> > > > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> >> > > > > > >> > > > > > diff --git a/arch/parisc/kernel/signal.c >> > > > > > b/arch/parisc/kernel/signal.c >> > > > > > index dc1ea79..b0414ad 100644 >> > > > > > --- a/arch/parisc/kernel/signal.c >> > > > > > +++ b/arch/parisc/kernel/signal.c >> > > > > > @@ -435,6 +435,48 @@ handle_signal(struct ksignal *ksig, >> > > > > > struct >> > > > > > pt_regs *regs, >> > > > > > int in_syscall) >> > > > > > regs->gr[28]); >> > > > > > } >> > > > > > >> > > > > > +/* >> > > > > > + * Check the delay branch in userspace how the syscall >> > > > > > number >> > > > > > gets loaded into >> > > > > > + * %r20 and adjust as needed. >> > > > > >> > > > > I'm pretty sure "Check the delay branch in userspace how the >> > > > > syscall..." >> > > > > is not an English construct. ;-) Suggested rewording: >> > > > > >> > > > > "Check how the syscall number gets loaded into %r20 within >> > > > > the delay branch in userspace and adjust as needed." >> > > >> > > Thanks! >> > > I'll change that. >> > > >> > > >> > > > > > + */ >> > > > > > + >> > > > > > +static void check_syscallno_in_delay_branch(struct pt_regs >> > > > > > *regs) >> > > > > > +{ >> > > > > > + unsigned int opcode, source_reg; >> > > > > >> > > > > Why "unsigned int" above rather than u32 ? Since we're using >> > > > > opcode as target variable for a get_user, it would be clearer >> > > > > if the type of the __user * match the type of the target >> > > > > kernel >> > > > > variable. (understood that those happen to have the same >> > > > > bitness >> > > > > and type size on all Linux architectures, but it would be >> > > > > clearer >> > > > > nevertheless). >> > > >> > > Yes, seems OK. >> > > I'll change that. >> > > >> > > > > > + u32 __user *uaddr; >> > > > > > + >> > > > > > + /* Usually we don't have to restore %r20 (the >> > > > > > system >> > > > > > call number) >> > > > > > + * because it gets loaded in the delay slot of the >> > > > > > branch external >> > > > > > + * instruction via the ldi instruction. >> > > > > > + * In some cases a register-to-register copy >> > > > > > instruction might have >> > > > > > + * been used instead, in which case we need to >> > > > > > copy >> > > > > > the syscall >> > > > > > + * number into the source register before >> > > > > > returning to >> > > > > > userspace. >> > > > > > + */ >> > > > > > + >> > > > > > + /* A syscall is just a branch, so all >> > > > > > + * we have to do is fiddle the return pointer. >> > > > > > + */ >> > > > > > + regs->gr[31] -= 8; /* delayed branching */ >> > > > > > + >> > > > > > + /* Get assembler opcode of code in delay branch */ >> > > > > > + uaddr = (unsigned int *) (regs->gr[31] + 1); >> > > > > > + get_user(opcode, uaddr); >> > > > > >> > > > > get_user() can fail due to EFAULT. This error should be >> > > > > handled here, otherwise this could lead to the following >> > > > > code using an uninitialized opcode variable, which could >> > > > > indirectly leak a few bits of kernel stack information >> > > > > to userspace (security concern). One attack vector I have >> > > > > in mind for this is ptrace(), which might be able to tweak >> > > > > those register values. >> > > >> > > Yes, generally get_user() can fail. >> > > But this would be rather strange in that case, because >> > > the syscall was started by userspace from this address. >> > > So, without the code at that address in userspace, we would >> > > never have reached this get_user(). >> > >> > Actually, that's not necessarily a safe assumption. Any memory >> > allocation in a syscall path (except GFP_ATOMIC) can trigger >> > reclaim >> > and since this is a signal restart path, that's entirely possible. >> > Reclaim could pull the backing page out from under the syscall, so >> > in >> > a low memory situation it is possible get_user() could fail with >> > EFAULT >> >> Really? >> Maybe I misunderstood...? >> So, let's say we have low memory and the kernel "swapped" out the >> userspace. >> I assume that when there is no memory pressure get_user() would >> pull the page in again, and if it's memory pressure, then with the >> assumption >> there is no memory left it probably return EFAULT (is that true?). >> But what happens then when we return to userspace? >> I expect userspace then to segfault. >> >> I will add the check for EFAULT, but just trying to understand... > > Actually, you're right, I was misremembering why these functions return > EFAULT. It's if the access is invalid. They will actually try to page > back in via the fault handler if the backing has gone. I think you could still trigger the EFAULT by having a concurrent thread running within the same process address space invoking munmap on the memory range that includes the get_user() target memory area. This would clearly be a hostile user-space application, but it's important to validate all user inputs very carefully, especially for misbehaving applications. Thanks, Mathieu > > James > >> > unless get_user_page() has been called somewhere to pin the page. >> >> Helge >> -- >> To unsubscribe from this list: send the line "unsubscribe linux >> -parisc" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html