On 2/10/20 10:32 AM, Casey Schaufler wrote:
On 2/10/2020 6:55 AM, Stephen Smalley wrote:
On 2/10/20 8:25 AM, Stephen Smalley wrote:
On 2/10/20 6:56 AM, Simon McVittie wrote:
On Mon, 03 Feb 2020 at 13:54:45 -0500, Stephen Smalley wrote:
The printable ASCII bit is based on what the dbus maintainer requested in
previous discussions.
I thought in previous discussions, we had come to the conclusion that
I can't assume it's 7-bit ASCII. (If I *can* assume that for this new
API, that's even better.)
To be clear, when I say ASCII I mean a sequence of bytes != '\0' with
their high bit unset (x & 0x7f == x) and the obvious mapping to/from
Unicode (bytes '\1' to '\x7f' represent codepoints U+0001 to U+007F). Is
that the same thing you mean?
I mean the subset of 7-bit ASCII that satisfies isprint() using the "C" locale. That is already true for SELinux with the existing interfaces. I can't necessarily speak for the others.
Looks like Smack labels are similarly restricted, per Documentation/admin-guide/LSM/Smack.rst. So I guess the only one that is perhaps unclear is AppArmor, since its labels are typically derived from pathnames? Can an AppArmor label returned via its getprocattr() hook be any legal pathname?
Because attr/context (and later, SO_PEERCONTEXT) are new interfaces
there is no need to exactly duplicate what is in attr/current (later
SO_PEERSEC). I already plan to omit the "mode" component of the
AppArmor data in the AppArmor hook, as was discussed earlier. I would
prefer ASCII, but if AppArmor needs bytestrings, that's what we'll
have to do.
sadly, to not break userspace its a byte string because that is what the path based profile names are. AppArmor does support a more limited non path based profile name but I can't guarantee that is what userspace is using in policy.
I thought the conclusion we had come to in previous conversations was
that the LSM context is what GLib calls a "bytestring", the same as
filenames and environment variables - an opaque sequence of bytes != '\0',
with no further guarantees, and no specified encoding or mapping to/from
Unicode (most likely some superset of ASCII like UTF-8 or Latin-1,
but nobody knows which one, and they coould equally well be some binary
encoding with no Unicode meaning, as long as it avoids '\0').
If I can safely assume that a new kernel <-> user-space API is constrained
to UTF-8 or a UTF-8 subset like ASCII, then I can provide more friendly
APIs for user-space features built over it. If that isn't possible, the
next best thing is a "bytestring" like filenames, environment variables,
and most kernel <-> user-space strings in general.
smcv