Justine Tunney <redbean.systems@xxxxxxxxx> writes: > Actually Portable Executable (APE) is a file format that polyglots Windows > Portable Executable with a UNIX Sixth Edition shell script (which predates > the shebang line). It was introduced in 2020 and is specified formally in: > https://github.com/jart/cosmopolitan/blob/master/ape/specification.md v0.1 > and we even got POSIX to change their rules to let us use this clever hack > https://www.austingroupbugs.net/view.php?id=1250 Our file format is widely > used on sites like Hugging Face to distribute ML models, and gets hundreds > of thousands of downloads per month. > > This change solves an issue where binfmt_misc registrations added by other > packages and distros claim the MZ prefix, which has prevented our software > from running normally (where the shell passes the script to /bin/sh). It's > possible to address this issue with a few lines of kernel code, since APEs > are basically a thin veneer around the actual ELF header Linux cares about > > So what we do here in binfmt_elf, is we check for two of the magic numbers > specified by the APE format (MZqFpD=' and jartsr=') which if present shall > cause the kernel to load the first 8192 bytes of the exe to decode the ELF > header. Once the 64-byte Elf64_Ehdr is obtained loading continues normally > > This change effectively introduces fat binary support to the Linux OS too. > Multiple ELF headers may be encoded into a single executable file with our > convention. The cosmocc C/C++ toolchain for instance produces fat binaries > that run ARM64 and AMD64 for six OSes using Cosmopolitan Libc. However the > technique would generalize to other libcs and it's perfectly valid for the > binary to be Linux only and simply leverage APE for the multi-arch support > > One alternative we considered is to add a feature to binfmt_misc that lets > distro maintainers configure their rules (e.g. "run detectors") so that it > will only prevent MZ executables from running, but shall ignore the longer > MZqFpD=' prefix used by APE. However the effort to implement functionality > into the binfmt_misc subsystem, and appealing to dozens of distributors to > change their system-wide configs, would be much more work and add needless > user-visible complexity to millions of systems. On the face of it your #ifdefs are wrong as the mishandle the case where binfmt_elf is included in compat_binfmt_elf. I don't understand why you aren't using the already existing ELF_ARCH define for those values. That said my gut suggests that you would be better with a separate binfmt_apt.c that can contain all of the ape weirdness and not mess up binfmt_elf.c I would structure binfmt_ape as either using the exported entry points of binfmt_elf.c (aka load_elf_binary) or including binfmt_elf.c like compat_binfmt_elf.c It also makes me nervous that you are rolling your own versions of strstr and your octal parser. Perhaps that is needed but I would expect to find such helpers already existing in the kernel's string routines. Beyond that I don't currently see any alignment requirements in the APE specification. For the optimization of mmaping a file requirements that certain parts of the file need to be aligned is pretty much a show stopper. There are also all kinds of caveats that come with APE such as not supporting dynamic linking. The more I look at APE and things like it's encoding of the ELF header in octal rather than in binary, and it's taking advantage of everything current ELF loader's don't do the more I really think it's differences should be separated from the rest of the ELF code. I don't see a problem sharing code, but APE isn't ELF and as ELF evolves there is a real chance of breaking APE and visa-versa. So having all of their separate assumptions separate seems like a good idea. Eric > --- > fs/binfmt_elf.c | 115 +++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 113 insertions(+), 2 deletions(-) > > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c > index 5ae8045f4df4..d5a0aea73c99 100644 > --- a/fs/binfmt_elf.c > +++ b/fs/binfmt_elf.c > @@ -816,6 +816,110 @@ static int parse_elf_properties(struct file *f, const struct elf_phdr *phdr, > return ret == -ENOENT ? 0 : ret; > } > > +static int ape_find_printf(const unsigned char page[8192], int i) > +{ > + for (; i + 8 < 8192; ++i) { > + if (memcmp(page + i, "printf '", 8) == 0) { > + return i + 8; > + } > + } > + return -ENOEXEC; > +} > + > +static int ape_parse_octal(const unsigned char page[8192], int i, int *pc) > +{ > + int c; > + if ('0' <= page[i] && page[i] <= '7') { > + c = page[i++] - '0'; > + if ('0' <= page[i] && page[i] <= '7') { > + c *= 8; > + c += page[i++] - '0'; > + if ('0' <= page[i] && page[i] <= '7') { > + c *= 8; > + c += page[i++] - '0'; > + } > + } > + *pc = c; > + } > + return i; > +} > + > +/* > + * Files beginning with "MZqFpD" are Actually Portable Executables, > + * which have a printf statement in the first 8192 bytes with octal > + * codes that specify the ELF header. APE also specifies `jartsr='` > + * as an alternative prefix, intended for binaries that do not want > + * to be identified as Windows executables. Like the \177ELF magic, > + * all these prefixes decode as x86 jump instructions that could be > + * used for 16-bit bootloaders or 32-bit / 64-bit flat executables. > + * Most importantly they provide a fallback path for Thompson shell > + * compatible command interpreters, which do not require a shebang, > + * e.g. bash, zsh, fish, bourne, almquist, etc. Please note that in > + * order to meet the requirements of POSIX.1, the single quote must > + * be followed by a newline character, before any null bytes occur. > + * See also: https://www.austingroupbugs.net/view.php?id=1250 > + */ > +static int ape_decode_elf(struct linux_binprm *bprm, char ebuf[64]) > +{ > + int c; > + int i; > + int retval; > + int ebuf_index; > + int desired_machine; > + unsigned char *page; > + struct elfhdr *ehdr; > + > +#if defined(__aarch64__) > + desired_machine = EM_AARCH64; > +#elif defined(__powerpc64__) > + desired_machine = EM_PPC64; > +#elif defined(__riscv) > + desired_machine = EM_RISCV; > +#else > + desired_machine = EM_X86_64; > +#endif > + > + page = kmalloc(8192, GFP_KERNEL); > + if (!page) > + return -ENOMEM; > + retval = elf_read(bprm->file, page, 8192, 0); > + if (retval < 0) > + goto out_free_page; > + i = 0; > +keep_looking: > + retval = ape_find_printf(page, i); > + if (retval < 0) > + goto out_free_page; > + > + i = retval; > + ebuf_index = 0; > + retval = -ENOEXEC; > + while (i + 3 < 8192) { > + c = page[i++]; > + if (c == '\'') > + break; > + if (c == '\\') > + i = ape_parse_octal(page, i, &c); > + if (ebuf_index < 64) { > + ebuf[ebuf_index++] = c; > + } else { > + goto out_free_page; > + } > + } > + if (ebuf_index != 64) > + goto out_free_page; > + if (memcmp(ebuf, ELFMAG, SELFMAG) != 0) > + goto keep_looking; > + ehdr = (struct elfhdr *)ebuf; > + if (ehdr->e_machine != desired_machine) > + goto keep_looking; > + retval = 0; > + > +out_free_page: > + kfree(page); > + return retval; > +} > + > static int load_elf_binary(struct linux_binprm *bprm) > { > struct file *interpreter = NULL; /* to shut gcc up */ > @@ -840,8 +944,15 @@ static int load_elf_binary(struct linux_binprm *bprm) > > retval = -ENOEXEC; > /* First of all, some simple consistency checks */ > - if (memcmp(elf_ex->e_ident, ELFMAG, SELFMAG) != 0) > - goto out; > + if (memcmp(elf_ex->e_ident, ELFMAG, SELFMAG) != 0) { > + if (memcmp(bprm->buf, "MZqFpD='", 8) != 0 && > + memcmp(bprm->buf, "jartsr='", 8) != 0) > + goto out; > + retval = ape_decode_elf(bprm, bprm->buf); > + if (retval < 0) > + goto out; > + retval = -ENOEXEC; > + } > > if (elf_ex->e_type != ET_EXEC && elf_ex->e_type != ET_DYN) > goto out;