>>> Lennart Poettering <lennart@xxxxxxxxxxxxxx> schrieb am 27.04.2022 um 13:10 in Nachricht <Ymkksza00BPhDMGq@gardel-login>: > On Mi, 27.04.22 09:09, Ulrich Windl (Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx) > wrote: > >> Hi! >> >> Having written an RFC 3164 compatible syslog daemon, I noticed that systemd >> created syslog messages with non-ASCII characters. >> The problem is that a remote syslogd can hardly guess the correct character >> set (I'm using rsyslog to forward local messages to a remote >> server). > > It's 2022. I think at this point, software should always assume the > charset is UTF-8 if it doesn't have an reason to believe otherwise. > > It's kinda what we started to do all across our codebase really. We'll > use UTF-8 for everything by default. For some things where people > complain sufficeintly loudly we'll conditionalize them so that we have > some fallback in place if we know for sure UTF-8 is not OK, but the > default we do is always and everywhere UTF-8. > >> Example of such message: >> systemd-tmpfiles[3311]: [/usr/lib/tmpfiles.d/svnserve.conf:1] Line > references >> path below legacy directory /var/run/, updating /var/run/svnserve → >> /run/svnserve; please update the tmpfiles.d/ drop-in file accordingly. >> >> (The arrow is encoded as three bytes (\xe2\x86\x92)) >> >> RFC 5425 syslog messages require the use of a BOM (%xEF.BB.BF) at the >> beginning of a message if the message used UTF-8: > > We do not implement RFC 5425, as glibc doesn't support that. In fact > we don't even implement RFC 3164 in full, since glibc generates the > messages in a very specific format only. > >> >> MSG = MSG-ANY / MSG-UTF8 >> MSG-ANY = *OCTET ; not starting with BOM >> MSG-UTF8 = BOM UTF-8-STRING >> BOM = %xEF.BB.BF >> >> Wouldn't it make sense to add such a BOM for RFC 3164 syslog messages also > if >> non-ASCII (i.e.: UTF-8) encoded characters are used? > > There's plenty software that doesn't support RFC 5425, and putting a > BOM first is certainly not implemented in any of those. I think BOM is > hideous and defaulting to UTF-8 generally safe. If we'd put BOM first, > these messages would likely not be compatible with a large variety of > consumers anymore, because they can't handle BOM. This would be worse That's a non-argument: You say you don't adhere to any of the standards, and claim if you would do, things would break. ??? > than the status quo I am sure, since if we just send UTF-8 things > should generally just work fine for any software that either a) also > defaults to UTF-8 when encountering an 8bit char or b) is agonistic to > charsets and just passes data thorugh. Yes, put the head in the sand hoping problems are gone when you look up again... ;-) > > So, yeah, we might be stretching stdandards and tradition a bit, but > it actually works out quite well so far. A good argument for driving without a saftey-belt, BTW. Regards, Ulrich > > Lennart > > -- > Lennart Poettering, Berlin