On Fri, Nov 26, 2021, at 12:24 PM, Florian Weimer wrote: > * Andy Lutomirski: > >> On Fri, Nov 26, 2021, at 5:47 AM, Florian Weimer wrote: >>> Distributions struggle with changing the default for vsyscall >>> emulation because it is a clear break of userspace ABI, something >>> that should not happen. >>> >>> The legacy vsyscall interface is supposed to be used by libcs only, >>> not by applications. This commit adds a new arch_prctl request, >>> ARCH_VSYSCALL_LOCKOUT. Newer libcs can adopt this request to signal >>> to the kernel that the process does not need vsyscall emulation. >>> The kernel can then disable it for the remaining lifetime of the >>> process. Legacy libcs do not perform this call, so vsyscall remains >>> enabled for them. This approach should achieves backwards >>> compatibility (perfect compatibility if the assumption that only libcs >>> use vsyscall is accurate), and it provides full hardening for new >>> binaries. >> >> Why is a lockout needed instead of just a toggle? By the time an >> attacker can issue prctls, an emulated vsyscall seems like a pretty >> minor exploit technique. And programs that load legacy modules or >> instrument other programs might need to re-enable them. > > For glibc, I plan to add an environment variable to disable the lockout. > There's no ELF markup that would allow us to do this during dlopen. > (And after this change, you can run an old distribution in a chroot > for legacy software, something that the userspace ABI break prevents.) > > If it can be disabled, people will definitely say, “we get more complete > hardening if we break old userspace”. I want to avoid that. (People > will say that anyway because there's this fairly large window of libcs > that don't use vsyscalls anymore, but have not been patched yet to do > the lockout.) I’m having trouble following the logic. What I mean is that I think it should be possible to do the arch_prctl again to turn vsyscalls back on. > > Maybe the lockout also simplifies the implementation? > >> Also, the interaction with emulate mode is somewhat complex. For now, >> let’s support this in xonly mode only. A complete implementation will >> require nontrivial mm work. I had that implemented pre-KPTI, but KPTI >> made it more complicated. > > I admit I only looked at the code in emulate_vsyscall. It has code that > seems to deal with faults not due to instruction fetch, and also checks > for vsyscall=emulate mode. But it seems that we don't get to this point > for reads in vsyscall=emulate mode, presumably because the page is > already mapped? Yes, and, with KPTI off, it’s nontrivial to unmap it. I have code for this, but I’m not sure the complexity is worthwhile. > >> Finally, /proc/self/maps should be wired up via the gate_area code. > > Should the "[vsyscall]" string change to something else if execution is > disabled? I think the line should disappear entirely, just like booting with vsyscall=none. > > Thanks, > Florian