Re: Mangling shebangs in text files: How to detect them, bug in the current implementation and possible solutions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Wednesday, September 22, 2021 7:21:42 AM EDT Miro Hrončok wrote:
> for many releases, Fedora has the brp-mangle-sehbangs BuildRoot Policy
> Script that does the following:
> 
>   1) Gets all executable files in the buildroot
>   2) Gets all "text" files from those
>   3a) Mangles shebangs that are "wrong"
>       (e.g. #!/usr/bin/env node -> #!/usr/bin/node)
>   3b) Removes executable bits from "text" files without shebangs

This is interesting. I didn't know Fedora had such a policy. I have been 
doing studies of this myself because fapolicyd wants correctly identified 
scripts.

> The idea behind this is that all "text" files that are executable need a 
> shebang and if they don't have it, something is wrong. OTOH files that are
> "binary" don't need it.
> 
> I intentionally put the terms "text" and "binary" in quotation marks, as
> the definition is somewhat fuzzy. Up until now, the script did the
> detection by utilizing the file tool to get the MIME type. If the MIME
> type starts with text/, it considered the executable to be a text file.

I find the file utility to be almost reliable. It changes how it identifies ELF 
files every couple releases. So, to stabilize this, fapolicyd-cli uses it's 
own logic to determine what kind of ELF file it finds. I also regularly find 
text/plain files where it cannot identify the language and files that are 
application/octet-stream which are also misidentified.

> However, a bug [1] has been discovered. Some obvious text files, such as 
> executable JavaScript scripts, are detected as application/ (e.g. 
> application/javascript), and hence are not considered "text".

This is another inconsistency with libmagic that we do battle with. It can 
change on th next release. Another example of this is python 
misidentification. In order to have any stability and correctness, fapolicyd 
ships with it's own libmagic override file. You might find 
fapolicyd-cli --ftype a bit more stable. I also put new languages we discover 
in the override while we are waiting for the patch to be accepted upstream. 
And I think upstream has not accepted a couple patches for languages libmagic 
can't detect right.

> If a JavaScript executable script has the #!/usr/bin/env node shebang, the
> brp-mangle-sehbangs script does not mangle it.
> 
> One possible solution [2] to this problem is to limit the number of bytes
> the MIME detection reads. My experiments showed that limiting the number
> of bytes to 8 always recognizes JavaScript (and other scripting languages)
> files as text/plain and binary files as application/octet-stream. As a
> side effect, it might make the BRP script faster. However, I am not sure
> if this approach is deterministic enough.
> 
> Another solution, suggested by Florian Weimer [3], is to not detect MIME
> type at all, but use eu-elfclassify instead. The idea is quite simple:
> If (and only if) the executable file is ELF [4], it does not require a
> shebang. Instead of some fragile idea about what files are text and what
> files are binary, this is quite deterministic. It allows mangling shebangs
> of executable ZIP files etc. 
> I've drafted the eu-elfclassify solution in a pull request [5]. However, we
>  have discovered that several non-elf binary formats in Fedora are
> possibly legitimately executable. E.g. .exe files (for mono or wine) or
> other formats registered with the kernel [6].
> 
> We are presented with 3 possible actions:
> 
> 1) Keep the script as it is, say the text/ MIME type limitation is how this
> BRP script was scoped. Affected packages would need to correct shebangs
> manually. 
> 2) Limit the MIME type detection to 8 bytes and hope it will not yield 
> incorrect results.
> 
> 3) Use eu-elfclassify. Consider non-ELF executables without shebangs bogus
> and document this. Packages that are affected would need to opt-out. 
> What do you think?

4) maybe fapolicyd-cli has better detection? Or at least, its more closely 
maintained. It also has it's own ELF detection so that it's stable from 
release to release.

-Steve

> [1] https://bugzilla.redhat.com/1998924
> [2] https://bugzilla.redhat.com/1998924#c3
> [3] https://bugzilla.redhat.com/1998924#c4
> [4] https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
> [5] https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/145
> [6] https://www.kernel.org/doc/html/latest/admin-guide/binfmt-misc.html
> -- 
> Miro Hrončok
> -- 
> Phone: +420777974800
> IRC: mhroncok
> _______________________________________________
> devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
> To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List
> Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List
> Archives:
> https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxx
> g Do not reply to spam on the list, report it:
> https://pagure.io/fedora-infrastructure



_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux