George N. White III: > In some cases, "established behaviour" means text files using ASCII > character sets, which creates problems for the majority of the world, > and should be considered "broken". In this day and age, we need to > pay attention to text encodings. The notion of "plain text" has been broken for decades. Virtually every computer system used a format that was distinct to itself, and with no content type identification in the file. It was generally presumed that your text file was the same as the rest of the text on your system, but this failed badly when you exchange files with foreigners. These days we usually use UTF, which is only barely compatible with ASCII. If it only uses characters from 0 to 127, it is. Any characters higher than that are not ASCII, and there are many hundreds, perhaps thousands of characters in its repertoire (hence why I said ASCII and UTF was *barely* compatible). Many computers using 7-bit text often falsely described their individual non-ASCII encoding of their own text as being ASCII. Or their 8-bit non-ASCII encoding as ASCII. And there's various different UTF schemes, too. Some computers had an associated meta file that did have info about the file (Amigas and their ".info" files along with the file you're interested in, Macs and their old dual data and fork file system). There was a certain amount of logic in that, but made sending files a nuisance - you had to remember to do both files, or the system had to manage that for you. And you're still dependent on the recipient being able to handle it, they may not. UTF can be determined by looking at a couple of bytes at the start (the BOM), and parsing more of the file if that's missing to try and guess what it might be by looking for some common code sequences (web browsers have done that for many years, and got it wrong for many years, too. But something that doesn't check for that and presumes ASCII will be surprised by extraneous content. Using a non-text format for data and config files is more robust. It can start with header info that unambiguously identifies itself, as part of a single self-describing file and data (data format, application it's intended for, etc). For what it's worth any binary file can contain text, directly as itself, it's not precluded. It can start with the identifying header, followed by text that could be parsed by more than the original application (for future-proofing). Applications that save and use binary data can also handle versioning better, if thought went into supporting that. If they data has changed format over time, and identifies what it is, the application can also use the data in ways it knows how it used to use, differently from how it currently does it, and get that right. -- NB: All unexpected mail to my mailbox is automatically deleted. I will only get to see the messages that are posted to the list. The following system info data is generated fresh for each post: uname -rsvp Linux 6.2.15-100.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Thu May 11 16:51:53 UTC 2023 x86_64 -- _______________________________________________ users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue