On Fri, Jun 26, 2020 at 11:49 AM Gabriel Krisman Bertazi <krisman@xxxxxxxxxxxxx> wrote: > We couldn't patch Windows code because of the aforementioned DRM and > anti-cheat mechanisms, but I suppose this limitation doesn't apply to > Wine/native code, and if this assumption is correct, this approach could > work. > > One complexity might be the consistent model for the syscall live > patching. I don't know how much of the problem is diminished from the > original userspace live-patching problem, but I believe at least part of > it applies. And fencing every thread to patch would kill performance. > Also, we cannot just patch everything at the beginning. How does rr > handle that? That's a good point. rr only allows one tracee thread to run at a time for other reasons, so when we consider patching a syscall instruction, we inspect all thread states to see if the patch would interfere with any other thread, and skip patching it in that unlikely case. (We'll try to patch it again next time that instruction is executed.) Wine would need some other solution, but indeed that could be a showstopper. > Another problem is that we will want to support i386 and other > architectures. For int 0x80, it is trickier to encode a branch to > another region, given the limited instruction space, and the patching > might not be possible in hot paths. This is no worse than for x86-64 `syscall`, which is also two bytes. We have pretty much always been able to patch the frequently executed syscalls by replacing both the syscall instruction and an instruction before or after the syscall with a five-byte jump, and folding the extra instruction into the stub. >I did port libsyscall-intercept to > x86-32 once and I could correctly patch glibc, but it's not guaranteed > that an updated libc or something else won't break it. We haven't found this to be much of a problem in rr. From time to time we have to add new patch patterns. The good news is that if things change so a syscall can't be patched, the result is degraded performance, not functional breakage. > I'm not sure the benefit of not needing enhanced kernel support > justifies the complexity and performance cost required to make this work > reliably, in particular since the semantics for a kernel implementation > that we are discussing doesn't seem overly intrusive and might have > other applications like in the generic filter Andy mentioned. That's fair --- our solution is complex. (But worth it --- for us, it's valuable that rr works on quite old Linux kernels.) As for performance, it performs well for us. I think we'd prefer our current approach to Andy's hypothetical PR_SET_SYSCALL_THUNK because the latter would have higher overhead (two trips into the kernel per syscall). We definitely don't want to require CAP_SYS_ADMIN so that rules out any eBPF-based alternative too. I would love to see a low-overhead unprivileged syscall interception mechanism that would obsolete our patching approach --- preferably one that's also stackable so rr can record and replay processes that use the new mechanism --- but I don't think any of the proposals here are that, yet, unfortunately. Rob -- Su ot deraeppa sah dna Rehtaf eht htiw saw hcihw, efil lanrete eht uoy ot mialcorp ew dna, ti ot yfitset dna ti nees evah ew; deraeppa efil eht. Efil fo Drow eht gninrecnoc mialcorp ew siht - dehcuot evah sdnah ruo dna ta dekool evah ew hcihw, seye ruo htiw nees evah ew hcihw, draeh evah ew hcihw, gninnigeb eht morf saw hcihw taht.