On 1/10/2025 2:12 PM, Günther Noack wrote:
Happy New Year!
Happy New Year! Glad to see you :)
On Tue, Dec 24, 2024 at 07:55:01PM +0300, Mikhail Ivanov wrote:
The bitmask approach leads to a complete refactoring of socket rule
storage. This shouldn't be a big issue, since we're gonna need
multiplexer for insert_rule(), find_rule() with a port range feature
anyway [1]. But it seems that the best approach of storing rules
composed of bitmasks is to store them in linked list and perform
linear scan in landlock_find_rule(). Any other approach is likely to
be too heavy and complex.
Do you think such refactoring is reasonable?
[1] https://github.com/landlock-lsm/linux/issues/16
The way I understood it in your mail from Nov 28th [1], I thought that the
bitmasks would only exist at the UAPI layer so that users could more
conveniently specify multiple "types" at the same time. In other
words, a rule which is now expressed as
{
.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
.family = AF_INET,
.types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
.protocol = 0,
},
used to be expressed like this (without bitmasks):
{
.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
.family = AF_INET,
.type = SOCK_STREAM,
.protocol = 0,
},
{
.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
.family = AF_INET,
.type = SOCK_DGRAM,
.protocol = 0,
},
Correct, but we also agreed to use bitmasks for "family" field as well:
https://lore.kernel.org/all/af72be74-50c7-d251-5df3-a2c63c73296a@xxxxxxxxxxxxxxxxxxx/
I do not understand why this convenience feature in the UAPI layer
requires a change to the data structures that Landlock uses
internally? As far as I can tell, struct landlock_socket_attr is only
used in syscalls.c and converted to other data structures already. I
would have imagined that we'd "unroll" the specified bitmasks into the
possible combinations in the add_rule_socket() function and then call
landlock_append_socket_rule() multiple times with each of these?
I thought about unrolling bitmask into multiple entries in rbtree, and
came up with following possible issue:
Imagine that a user creates a rule that allows to create sockets of all
possible families and types (with protocol=0 for example):
{
.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
.families = INT64_MAX, /* 64 set bits */
.types = INT16_MAX, /* 16 set bits */
.protocol = 0,
},
This will add 64 x 16 = 1024 entries to the rbtree. Currently, the
struct landlock_rule, which is used to store rules, weighs 40B. So, it
will be 40kB by a single rule. Even if we allow rules with only
"existing" families and types, it will be 46 x 7 = 322 entries and ~12kB
by a single rule.
I understand that this may be degenerate case and most common rule will
result in less then 8 (or 4) entries in rbtree, but I think API should
be as intuitive as possible. User can expect to see the same
memory usage regardless of the content of the rule.
I'm not really sure if this case is really an issue, so I'd be glad
to hear your opinion on it.
I also initially thought that it would be difficult to handle errors
when adding rule. But it seems that it's not gonna be an issue with
correctly implemented removal (this will result in additional method in
ruleset.c and small wrapper over rule structure that would not affect
ruleset domain implementation).
That being said, I am not a big fan of red-black trees for such simple
integer lookups either, and I also think there should be something
better if we make more use of the properties of the input ranges. The
question is though whether you want to couple that to this socket type
patch set, or rather do it in a follow up? (So far we have been doing
fine with the red black trees, and we are already contemplating the
possibility of changing these internal structures in [2]. We have
also used RB trees for the "port" rules with a similar reasoning,
IIRC.)
I think it'll be better to have a separate series for [2] if the socket
restriction can be implemented without rbtree refactoring.
Regarding the port range feature, I am also not sure whether the data
structure for that would even be similar? Looking for a containment
in a set of integer ranges is a different task than looking for an
exact match in a non-contiguous set of integers.
It seems like it would be better to have a different structure for
non-ranged lookups if it results in less memory and lookup duration.
First, we need to check possible candidates for both cases.
In any case, I feel that for now, an exact look up in the RB tree
would work fine as a generic solution (especially considering that the
set of added rules is probably usually small). IMHO, finding a more
appropriate data structure might be a can of worms that could further
delay the patch set and which might better be discussed separately.
WDYT?
I agree if you think that worst case presented above is not a big issue.
–Günther
[1] https://lore.kernel.org/all/eafd855d-2681-8dfd-a2be-9c02fc07050d@xxxxxxxxxxxxxxxxxxx/
[2] https://github.com/landlock-lsm/linux/issues/1