On Thu, Jan 22, 2015 at 8:32 AM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: > On Tue, Jan 20, 2015 at 3:04 PM, Russell King - ARM Linux > <linux@xxxxxxxxxxxxxxxx> wrote: >> On Tue, Jan 20, 2015 at 10:45:19PM +0000, Russell King - ARM Linux wrote: >>> Well, the whole question is this: is restarting a system call like >>> usleep() really a separate system call, or is it a kernel implementation >>> detail? >>> >>> If you wanted seccomp to see this, what would be the use case? Why >>> would seccomp want to block this syscall? Does it make sense for >>> seccomp to block this syscall when it doesn't block something like >>> usleep() and then have usleep() fail just because the thread received >>> a signal? >>> >>> I personally regard the whole restart system call thing as a purely >>> kernel internal thing which should not be exposed to userland. If >>> we decide that it should be exposed to userland, then it becomes part >>> of the user ABI, and it /could/ become difficult if we needed to >>> change it in future - and I'd rather not get into the "oh shit, we >>> can't change it because that would break app X" crap. >> >> Here's a scenario where it could become a problem: >> >> Let's say that we want to use seccomp to secure some code which issues >> system calls. We determine that the app uses system calls which don't >> result in the restart system call being issued, so we decide to ask >> seccomp to block the restart system call. Some of these system calls >> that the app was using are restartable system calls. >> >> When these system calls are restarted, what we see via ptrace etc is >> that the system call simply gets re-issued as its own system call. >> >> In a future kernel version, we decide that we could really do with one >> of those system calls using the restart block feature, so we arrange >> for it to set up the restart block, and return -ERESTART_BLOCK. That's >> fine for most applications, but this app now breaks. >> >> The side effect of that breakage is that we have to revert that kernel >> change - because we've broken userland, and that's simply not allowed. >> >> Now look at the alternative: we don't make the restart syscall visible. >> This means that we hide that detail, and we actually reflect the >> behaviour that we've had for the other system call restart mechanisms, >> and we don't have to fear userspace breakage as a result of switching >> from one restart mechanism to another. >> >> I am very much of the opinion that we should be trying to limit the >> exposure of inappropriate kernel internal details to userland, because >> userland has a habbit of becoming reliant on them, and when it does, >> it makes kernel maintanence unnecessarily harder. > > I totally agree with you. :) My question here is more about what we > should do with what we currently have since we have some unexpected > combinations. > > There is already an __NR_restart_syscall syscall and it seems like > it's already part of the userspace ABI: > - it is possible to call it from userspace directly > - seccomp "sees" it > - ptrace doesn't see it > > Native ARM64 hides the restart from both seccomp and ptrace, and this > seems like the right idea, except that restart_syscall is still > callable from userspace. I don't think there's a way to make that > vanish, which means we'll always have an exposed syscall. If anything > goes wrong with it (which we've been quite close to recently[1]), > there would be no way to have seccomp filter it. > > So, at the least, I'd like arm64 to NOT hide restart_syscall from > seccomp, and at best I'd like both arm and arm64 to (somehow) entirely > remove restart_syscall from the userspace ABI so it wouldn't need to > be filtered, and wouldn't become a weird ABI hiccup as you've > described. > > I fail to imagine a way to remove restart_syscall from userspace, so > I'm left with wanting parity of behavior between ARM and ARM64 (and > x86). What's the right way forward? My sufferings are an opposite of what seccompt expects: currently I do not see any possible way (from userspace) to get syscall number which was restarted, because at some given time userspace checks the procfs syscall file and sees NR_restart there, without any chance to understand what exactly was restarted (I am talking about some kind of debugging tool which does dead-lock analysis of stuck tasks). I totally agree with Russell not to provide kernel guts to userspace, but it is already done. Too late. So probably there is a need to split syscall on two numbers: real and effective. Real number we have right now on x86. And this should be done for both ptrace and procfs syscall file. (am I right that only for ARM we already have PTRACE_SET_SYSCALL? seems we can add also real/effective getter) -- Roman -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html