Re: Mangling shebangs in text files: How to detect them, bug in the current implementation and possible solutions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you for re-examining this. I need to mangle shebangs manually in %nodejs_sitelib in the NodeJS packages I maintain [1], and I had never taken the time to determine whether this was actually a bug or not. It turns out this limitation was known in general terms at the very beginning [2].

If you choose option one, it should still be possible to teach the script about some common “actually-text” MIME types like application/javascript in order to reduce the number of cases that need to be handled manually.

Option two makes me uncomfortable, because it’s hard to predict what kinds of inputs might break it. It’s still possible it could be unproblematic or even the best choice in practice.

I like option three pretty well because it’s easy to understand the limitations of the script; it’s fairly easy to document the exceptional cases, with a reminder of how to disable the script [3]; and the types of packages that need to work around its limitations are confined to particular domains or are already fairly exceptional.

In the end, my personal opinions on the choice are rather weak, but I appreciate the careful consideration.

– Ben Beasley

[1] https://src.fedoraproject.org/rpms/fx/blob/be060a34853527ecf60991e71133d84f07cf3f3a/f/fx.spec#_78 [2] https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/9#comment-4424
[3] https://pagure.io/packaging-committee/issue/738#comment-489301

On 9/22/21 07:21, Miro Hrončok wrote:
Hello,

for many releases, Fedora has the brp-mangle-sehbangs BuildRoot Policy Script that does the following:

  1) Gets all executable files in the buildroot
  2) Gets all "text" files from those
  3a) Mangles shebangs that are "wrong"
      (e.g. #!/usr/bin/env node -> #!/usr/bin/node)
  3b) Removes executable bits from "text" files without shebangs

The idea behind this is that all "text" files that are executable need a shebang and if they don't have it, something is wrong. OTOH files that are "binary" don't need it.

I intentionally put the terms "text" and "binary" in quotation marks, as the definition is somewhat fuzzy. Up until now, the script did the detection by utilizing the file tool to get the MIME type. If the MIME type starts with text/, it considered the executable to be a text file.

However, a bug [1] has been discovered. Some obvious text files, such as executable JavaScript scripts, are detected as application/ (e.g. application/javascript), and hence are not considered "text". If a JavaScript executable script has the #!/usr/bin/env node shebang, the brp-mangle-sehbangs script does not mangle it.

One possible solution [2] to this problem is to limit the number of bytes the MIME detection reads. My experiments showed that limiting the number of bytes to 8 always recognizes JavaScript (and other scripting languages) files as text/plain and binary files as application/octet-stream. As a side effect, it might make the BRP script faster. However, I am not sure if this approach is deterministic enough.

Another solution, suggested by Florian Weimer [3], is to not detect MIME type at all, but use eu-elfclassify instead. The idea is quite simple: If (and only if) the executable file is ELF [4], it does not require a shebang. Instead of some fragile idea about what files are text and what files are binary, this is quite deterministic. It allows mangling shebangs of executable ZIP files etc.

I've drafted the eu-elfclassify solution in a pull request [5]. However, we have discovered that several non-elf binary formats in Fedora are possibly legitimately executable. E.g. .exe files (for mono or wine) or other formats registered with the kernel [6].

We are presented with 3 possible actions:

1) Keep the script as it is, say the text/ MIME type limitation is how this BRP script was scoped. Affected packages would need to correct shebangs manually.

2) Limit the MIME type detection to 8 bytes and hope it will not yield incorrect results.

3) Use eu-elfclassify. Consider non-ELF executables without shebangs bogus and document this. Packages that are affected would need to opt-out.

What do you think?

[1] https://bugzilla.redhat.com/1998924
[2] https://bugzilla.redhat.com/1998924#c3
[3] https://bugzilla.redhat.com/1998924#c4
[4] https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
[5] https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/145
[6] https://www.kernel.org/doc/html/latest/admin-guide/binfmt-misc.html
_______________________________________________
packaging mailing list -- packaging@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to packaging-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/packaging@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure




[Index of Archives]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Forum]     [KDE Users]

  Powered by Linux