Re: OpenSSL Selection of Text Encoding for the -out and -text Options

Viktor Dukhovni <openssl-users@xxxxxxxxxxxx> · Sun, 19 Jan 2020 00:30:08 -0500

On Sun, Jan 19, 2020 at 02:51:53AM +0000, Douglas Morris via openssl-users wrote:

> I'm working on an ACME client written in Python3. I expect the
> certificate sent by the ACME server will be in utf-8 per RFC 8555,
> sec. 5.

Certificates are in DER or PEM form, not utf-8.  Some strings in the
certificate might be UTF-8, but that does not look relevant here.

> It seems from Python Standard Library function
> sys.getfilesystemencoding() that a filesystem has a particular
> encoding for filesystem names (which is not an explicit default for
> text files).

File system metadata (file names, ...) is distinct from file content.

> I wonder if OpenSSL (and generally other software) automatically uses
> the filesystem name encoding by default for all text output.

This makes no sense.  OpenSSL does not display filenames, it reads
data from files given to it via API calls and command-line options.

> I don't see anything about text encoding on the "Compilation and
> Installation" wiki page. I have OpenSSL from a Debian package. I don't
> see anything about text encoding in the configuration file
> /etc/ssl/openssl.cnf.

The issue does not come up.  OpenSSL functions that take filename
arguments use the the verbatim C-character arrays passed to them in API
calls.  The names are byte arrays not strings subject to encoding and
decoding.

> What is/are and how does OpenSSL choose the text encodings for -out
> and -text, respectively.

No encoding at all.

> Information about line encoding selection would be a nice bonus.

DER files are binary, and PEM files are text files.  The platform's C
library normally determines how line-oriented data is written to files.

OpenSSL's BIO abstraction over files generally uses STDIO to perform
the underlying I/O.  So line endings are a feature of the C-library,
not OpenSSL.

> I would like to have all my related domain certification files in the
> same text encoding and to decode the -text output into a string value
> as reliably (and as transparently to the user) as possible. My
> fallback position is of course to just hardcode utf-8.

Here, you seem to be confusing file name encodings with file content.
PEM files are base64-encoded ASCII.  As for the output of "x509 -text",
there are various options to control the output format.

At this time, you really should be using UTF-8 unconditionally.

-- 
    Viktor.