On Wed, Apr 27, 2022 at 10:09 AM Ulrich Windl <Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx> wrote:
Hi!
Having written an RFC 3164 compatible syslog daemon, I noticed that systemd
created syslog messages with non-ASCII characters.
The problem is that a remote syslogd can hardly guess the correct character
set (I'm using rsyslog to forward local messages to a remote server).
Example of such message:
systemd-tmpfiles[3311]: [/usr/lib/tmpfiles.d/svnserve.conf:1] Line references
path below legacy directory /var/run/, updating /var/run/svnserve →
/run/svnserve; please update the tmpfiles.d/ drop-in file accordingly.
(The arrow is encoded as three bytes (\xe2\x86\x92))
RFC 5425 syslog messages require the use of a BOM (%xEF.BB.BF) at the
beginning of a message if the message used UTF-8:
MSG = MSG-ANY / MSG-UTF8
MSG-ANY = *OCTET ; not starting with BOM
MSG-UTF8 = BOM UTF-8-STRING
BOM = %xEF.BB.BF
Wouldn't it make sense to add such a BOM for RFC 3164 syslog messages also if
non-ASCII (i.e.: UTF-8) encoded characters are used?
RFC 3164 over a local socket from journald to local rsyslogd? The local rsyslogd already knows if messages are UTF-8 because the system's $LANG (well, nl_langinfo) says so. And if rsyslog can't trust that for some reason (e.g. because a user might have a different locale), then systemd-journald won't be able to trust it either, so it won't know whether it could add the BOM.
RFC 3164 over the network to a remote server? Outside the scope for systemd, since it doesn't generate the network packets; your local rsyslogd forwarder does. (Also, why RFC 3164 and not 5425?)
Generally, if a message successfully decodes as UTF-8 then it's most likely actual UTF-8 (and if UTF-8 decode fails then you fall back to ISO8859-1). Various old systems get away with this without needing a UTF-8 BOM.
--
Mantas Mikulėnas