>>> Mantas Mikulenas <grawity@xxxxxxxxx> schrieb am 28.04.2022 um 09:39 in Nachricht <CAPWNY8WBtw5kJ80f4uEffYyR_CcY6=zigb8JUM7CYtkP0oWanQ@xxxxxxxxxxxxxx>: > On Thu, Apr 28, 2022 at 10:32 AM Ulrich Windl < > Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx> wrote: > >> >>> Mantas Mikulenas <grawity@xxxxxxxxx> schrieb am 27.04.2022 um 12:03 in >> Nachricht >> <CAPWNY8XO0tu6EdpJO538qyGBJ0kOmZo5iCaoJpPc8kt4QZ+vXg@xxxxxxxxxxxxxx>: >> > On Wed, Apr 27, 2022 at 10:09 AM Ulrich Windl < >> > Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx> wrote: >> > >> >> Hi! >> >> >> >> Having written an RFC 3164 compatible syslog daemon, I noticed that >> systemd >> >> created syslog messages with non-ASCII characters. >> >> The problem is that a remote syslogd can hardly guess the correct >> character >> >> set (I'm using rsyslog to forward local messages to a remote server). >> >> >> >> Example of such message: >> >> systemd-tmpfiles[3311]: [/usr/lib/tmpfiles.d/svnserve.conf:1] Line >> >> references >> >> path below legacy directory /var/run/, updating /var/run/svnserve → >> >> /run/svnserve; please update the tmpfiles.d/ drop-in file accordingly. >> >> >> >> (The arrow is encoded as three bytes (\xe2\x86\x92)) >> >> >> >> RFC 5425 syslog messages require the use of a BOM (%xEF.BB.BF) at the >> >> beginning of a message if the message used UTF-8: >> >> >> >> MSG = MSG-ANY / MSG-UTF8 >> >> MSG-ANY = *OCTET ; not starting with BOM >> >> MSG-UTF8 = BOM UTF-8-STRING >> >> BOM = %xEF.BB.BF >> >> >> >> Wouldn't it make sense to add such a BOM for RFC 3164 syslog messages >> also >> >> if >> >> non-ASCII (i.e.: UTF-8) encoded characters are used? >> >> >> > >> > RFC 3164 over a local socket from journald to local rsyslogd? The local >> >> Actually I wasn't quite sure about the default config in SLES12. >> It seems the flow is journald -> local rsyslogd -> remote syslogd >> >> > rsyslogd already knows if messages are UTF-8 because the system's $LANG >> > (well, nl_langinfo) says so. And if rsyslog can't trust that for some >> > reason (e.g. because a user might have a different locale), then >> > systemd-journald won't be able to trust it either, so it won't know >> whether >> > it could add the BOM. >> >> How could a remote syslog server know what the locale on the sending system >> is? >> > > It's not remote, it's local. I'm talking about the one that's receiving > messages from journald on the same machine. > > >> >> > >> > RFC 3164 over the network to a remote server? Outside the scope for >> > systemd, since it doesn't generate the network packets; your local >> rsyslogd >> > forwarder does. (Also, why RFC 3164 and not 5425?) >> >> If you look outside the world of systemd, about 99% of systems create the >> RFC >> 3164 type of messages. >> Some may send non-ASCII too, however. >> > > Still outside the scope of systemd. Systemd doesn't send RFC 3164 messages > over the network, either. Correct: It does not send, because it's unable to do so. That's why I used rsyslogd. > > >> >> > >> > Generally, if a message successfully decodes as UTF-8 then it's most >> likely >> > actual UTF-8 (and if UTF-8 decode fails then you fall back to ISO8859-1). >> > Various old systems get away with this without needing a UTF-8 BOM. >> >> Yes, you can just output what you received, hoping the messages will be >> presented correctly. >> I't just like sending 8-bit E-Mmail without a coding system or charset in >> the >> past. What I meant to say was: Guessing the encoding is a bad concept. >> > > Which is not what I was saying, but sure, whatever. > > -- > Mantas Mikulėnas