On Mon, Oct 9, 2023 at 10:37 AM Kees Cook <keescook@xxxxxxxxxxxx> wrote: > > On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote: > > On 07.09.23 22:24, Guilherme G. Piccoli wrote: > > > Currently the kernel provides a symlink to the executable binary, in the > > > form of procfs file exe_file (/proc/self/exe_file for example). But what > > > happens in interpreted scenarios (like binfmt_misc) is that such link > > > always points to the *interpreter*. For cases of Linux binary emulators, > > > like FEX [0] for example, it's then necessary to somehow mask that and > > > emulate the true binary path. > > > > I'm absolutely no expert on that, but I'm wondering if, instead of modifying > > exe_file and adding an interpreter file, you'd want to leave exe_file alone > > and instead provide an easier way to obtain the interpreted file. > > > > Can you maybe describe why modifying exe_file is desired (about which > > consumers are we worrying? ) and what exactly FEX does to handle that (how > > does it mask that?). > > > > So a bit more background on the challenges without this change would be > > appreciated. > > Yeah, it sounds like you're dealing with a process that examines > /proc/self/exe_file for itself only to find the binfmt_misc interpreter > when it was run via binfmt_misc? > > What actually breaks? Or rather, why does the process to examine > exe_file? I'm just trying to see if there are other solutions here that > would avoid creating an ambiguous interface... > > -- > Kees Cook Hey there, FEX-Emu developer here. I can try and explain some of the issues. First thing is that we should set the stage here that there is a fundamental discrepancy between how ELF interpreters are represented versus binfmt_misc interpreters when it comes to procfs exe. An ELF file today can either be static or dynamic, with the dynamic ELF files having a program header called PT_INTERP which will tell the kernel where its interpreter executable lives. In an x86-64 environment this is likely to be something like /lib64/ld-linux-x86-64.so.2. Today, the Kernel doesn't put the PT_INTERP handle into procfs exe, it instead uses the dynamic ELF that was originally launched. In contrast to how this behaviour works, a binfmt_misc interpreter file getting launched through execve may or may not have ELF header sections. But it is left up to the binfmt_misc handler to do whatever it may need. The kernel sets procfs exe to the binfmt_misc interpreter instead of the executable. This is fundamentally the contrasting behaviour that is trying to be improved. It seems like the this behaviour is an oversight of the original binfmt_misc implementation rather than any sort of ambition to ensure there is a difference. It's already ambiguous that the interface changes when executing an executable through binfmt_misc. Some simple ways applications break: - Applications like chrome tend to relaunch themselves through execve with `/proc/self/exe` - Chrome does this. I think Flatpaks or AppImage applications do this? - There are definitely more that do this that I have noticed. - In the cover letter there was a link to Mesa, the OSS OpenGL/Vulkan drivers using this - This library uses this interface to find out what application is running for applying workarounds for application bugs. Plenty of historical applications that use the API badly or incorrectly and need specific driver workarounds for them. - Some applications may use this path to open their own executable path and then mmap back in for doing tricky memory mirroring or dynamic linking of themselves. - Saw some old abandoned emulator software doing this. There's likely more uses that I haven't noticed from software using this interface. Onward to what FEX-Emu is and how it tries working around the issue with a fairly naive hack. FEX-Emu is an x86 and x86-64 CPU emulator that gets installed as a binfmt_misc interpreter. It then executes x86 and x86-64 ELF files on an Arm64 device as effectively a multi-arch capable fashion. It's lightweight in that all application processes and threads are just regular Arm64 processes and threads. This is similar to how qemu-user operates. When processing system calls, FEX will intercept any call that consumes a pathname, it will then inspect that path name and if it is one of the ways it is possible to access procfs/exe then it redirects to the true x86/x86-64 executable. This is an attempt to behave like how if the ELF was executed without a binfmt_misc handler. Pathnames captured in FEX-Emu today: - /proc/self/exe - /proc/<pid>/exe - /proc/thread-self/exe This is very fragile and doesn't cover the full range of how applications could access procfs. Applications could end up using the *at variants of syscalls with an FD that has /proc/self/ open. They could do simple tricks like `/proc/self/../self/exe` and it would side-step this check. It's a game of whack-a-mole and escalating overhead to try and close the gap purely due to, what appears to be, an oversight in how binfmt_misc and PT_INTERP is handled. Hopefully this explains why this is necessary and that reducing the differences between how PT_INTERP and binfmt_misc are represented is desired.