On Sep 19, 2017, at 3:19 PM, Julian Reschke <julian.reschke@xxxxxx> wrote:
Yet another reason why depending on BOMs is a bug. Text files are hard. If you want to sign a text file, you have to know the encoding, and you have to ignore the BOM if it's present, or else the signature doesn't work for equivalent versions of the file, only for the exact binary form of the file that was signed. This is precisely the point Carsten's been making: BOMs are harmful. There is no way to win with them. E.g., if a file is UTF-8, what is the signature? Is it on the actual bits in the file? The unicode? Is it okay if when you do EOL conversion, the signature breaks, or does the signature have to ignore EOL characters? What about whitespace? We have solved this problems numerous times, and AFAIK we haven't solved it by signing the binary, with the possible exception of subresource integrity. This works because the data is never not under control of the thing that's responsible for validating it. OpenPGP requires canonicalization, e.g. see section 5.2.4 of RFC 4880. Here we are talking about data blobs that are downloaded by one thing and used by another. They can be canonicalized in a variety of ways. If we were to specify signatures for RFC text files, we would have to have a specification that says how the signature is computed. An interesting task, perhaps, but kind of orthogonal to this discussion. Of course we could just say "sign the representation format," but in doing so we would be totally punting on interop. |