On 2/10/20 6:56 AM, Simon McVittie wrote:
On Mon, 03 Feb 2020 at 13:54:45 -0500, Stephen Smalley wrote:
The printable ASCII bit is based on what the dbus maintainer requested in
previous discussions.
I thought in previous discussions, we had come to the conclusion that
I can't assume it's 7-bit ASCII. (If I *can* assume that for this new
API, that's even better.)
To be clear, when I say ASCII I mean a sequence of bytes != '\0' with
their high bit unset (x & 0x7f == x) and the obvious mapping to/from
Unicode (bytes '\1' to '\x7f' represent codepoints U+0001 to U+007F). Is
that the same thing you mean?
I mean the subset of 7-bit ASCII that satisfies isprint() using the "C"
locale. That is already true for SELinux with the existing interfaces.
I can't necessarily speak for the others.
I thought the conclusion we had come to in previous conversations was
that the LSM context is what GLib calls a "bytestring", the same as
filenames and environment variables - an opaque sequence of bytes != '\0',
with no further guarantees, and no specified encoding or mapping to/from
Unicode (most likely some superset of ASCII like UTF-8 or Latin-1,
but nobody knows which one, and they could equally well be some binary
encoding with no Unicode meaning, as long as it avoids '\0').
If I can safely assume that a new kernel <-> user-space API is constrained
to UTF-8 or a UTF-8 subset like ASCII, then I can provide more friendly
APIs for user-space features built over it. If that isn't possible, the
next best thing is a "bytestring" like filenames, environment variables,
and most kernel <-> user-space strings in general.
smcv