Re: [RFC PATCH v3 01/19] landlock: Support socket access-control

Mikhail Ivanov <ivanov.mikhail1@xxxxxxxxxxxxxxxxxxx> · Fri, 10 Jan 2025 19:55:10 +0300

On 1/10/2025 7:27 PM, Günther Noack wrote:
On Fri, Jan 10, 2025 at 04:02:42PM +0300, Mikhail Ivanov wrote:
Correct, but we also agreed to use bitmasks for "family" field as well:

https://lore.kernel.org/all/af72be74-50c7-d251-5df3-a2c63c73296a@xxxxxxxxxxxxxxxxxxx/

OK

On 1/10/2025 2:12 PM, Günther Noack wrote:
I do not understand why this convenience feature in the UAPI layer
requires a change to the data structures that Landlock uses
internally?  As far as I can tell, struct landlock_socket_attr is only
used in syscalls.c and converted to other data structures already.  I
would have imagined that we'd "unroll" the specified bitmasks into the
possible combinations in the add_rule_socket() function and then call
landlock_append_socket_rule() multiple times with each of these?

I thought about unrolling bitmask into multiple entries in rbtree, and
came up with following possible issue:

Imagine that a user creates a rule that allows to create sockets of all
possible families and types (with protocol=0 for example):
{
	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
	.families = INT64_MAX, /* 64 set bits */
	.types = INT16_MAX, /* 16 set bits */
	.protocol = 0,
},

This will add 64 x 16 = 1024 entries to the rbtree. Currently, the
struct landlock_rule, which is used to store rules, weighs 40B. So, it
will be 40kB by a single rule. Even if we allow rules with only
"existing" families and types, it will be 46 x 7 = 322 entries and ~12kB
by a single rule.

I understand that this may be degenerate case and most common rule will
result in less then 8 (or 4) entries in rbtree, but I think API should
be as intuitive as possible. User can expect to see the same
memory usage regardless of the content of the rule.

I'm not really sure if this case is really an issue, so I'd be glad
to hear your opinion on it.

I think there are two separate questions here:

(a) I think it is OK that it is *possible* to allocate 40kB of kernel
     memory.  At least, this is already possible today by calling
     landlock_add_rule() repeatedly.

     I assume that the GFP_KERNEL_ACCOUNT flag is limiting the
     potential damage to the caller here?  That flag was added in the
     Landlock v19 patch set [1] ("Account objects to kmemcg.").

(b) I agree it might be counterintuitive when a single
     landlock_add_rule() call allocates more space than expected.

Mickaël, since it is already possible today (but harder), I assume
that you have thought about this problem before -- is it a problem
when users allocate more kernel memory through Landlock than they
expected?

Naive proposal:

If this is an issue, how about we set a low limit to the number of
families and the number of types that can be used in a single
landlock_add_rule() invocation?  (It tends to be easier to start with
a restrictive API and open it up later than the other way around.)

Looks like a good approach! Better to return with an error (which almost
always won't happen) and let the user to refactor the code than to
allow ruleset to eat an unexpected amount of memory.

For instance,

* In the families field, at most 2 bits may be set.
* In the types field, at most 2 bits may be set.

Or we can say that rule can contain not more than 4 combinations of
family and type.

BTW, 4 seems to be sufficient, at least for IP protocols.

In my understanding, the main use case of the bit vectors is that
there is a short way to say "all IPv4+v6 stream+dgram sockets", but we
do not know scenarios where much more than that is needed?  With that,
we would still keep people from accidentally allocating larger amounts
of memory, while permitting the main use case.

Agreed

Having independent limits for the family and type fields is a bit
easier to understand and document than imposing a limit on the
multiplication result.

That being said, I am not a big fan of red-black trees for such simple
integer lookups either, and I also think there should be something
better if we make more use of the properties of the input ranges. The
question is though whether you want to couple that to this socket type
patch set, or rather do it in a follow up?  (So far we have been doing
fine with the red black trees, and we are already contemplating the
possibility of changing these internal structures in [2].  We have
also used RB trees for the "port" rules with a similar reasoning,
IIRC.)

I think it'll be better to have a separate series for [2] if the socket
restriction can be implemented without rbtree refactoring.

Sounds good to me. 👍

–Günther

[1] https://lore.kernel.org/all/20200707180955.53024-2-mic@xxxxxxxxxxx/