On Thu, Jul 09, 2020 at 07:26:18PM +0200, Arnd Bergmann wrote: > On Wed, Jul 8, 2020 at 7:50 PM Mickaël Salaün <mic@xxxxxxxxxxx> wrote: > > On 08/07/2020 15:49, Arnd Bergmann wrote: > > > On Wed, Jul 8, 2020 at 3:04 PM Mickaël Salaün <mic@xxxxxxxxxxx> wrote: > > >> On 08/07/2020 10:57, Arnd Bergmann wrote: > > >>> On Tue, Jul 7, 2020 at 8:10 PM Mickaël Salaün <mic@xxxxxxxxxxx> wrote: > > >>> > > >>> It looks like all you need here today is a single argument bit, plus > > >>> possibly some room for extensibility. I would suggest removing all > > >>> the extra bits and using a syscall like > > >>> > > >>> SYSCALL_DEFINE1(landlock_create_ruleset, u32, flags); > > >>> > > >>> I don't really see how this needs any variable-length arguments, > > >>> it really doesn't do much. > > >> > > >> We need the attr_ptr/attr_size pattern because the number of ruleset > > >> properties will increase (e.g. network access mask). > > > > > > But how many bits do you think you will *actually* need in total that > > > this needs to be a two-dimensional set of flags? At the moment you > > > only have a single bit that you interpret. > > > > I think there is a misunderstanding. For this syscall I wasn't talking > > about the "options" field but about the "handled_access_fs" field which > > has 14 bits dedicated to control access to the file system: > > https://landlock.io/linux-doc/landlock-v19/security/landlock/user.html#filesystem-flags > > Ok, got it. I didn't read far enough there. > > > The idea is to add other handled_access_* fields for other kernel object > > types (e.g. network, process, etc.). > > > > The "options" field is fine as a raw __u32 syscall argument. > > I'd still like to avoid having it variable-length and structured though. > How about having a __u32 "options" flag, plus an indirect argument > with 32 fixed-length (all 32 bit or all 64 bit) flag words, each of which > corresponds to one of the option bits? > > It's still fairly complex that way, but not as much as the version > you have right now that can be extended in multiple dimensions. > > This could possibly also help avoid the need for the get_features What is this fresh hell again, please? > syscall: If user space just passes the bitmap of all the access flags > it wants to use in a fixed-size structure, the kernel can update the > bits to mask out the ones it does not understand and write back > that bitmap as the result of create_ruleset(). > > > >>> To be on the safe side, you might split up the flags into either the > > >>> upper/lower 16 bits or two u32 arguments, to allow both compatible > > >>> (ignored by older kernels if flag is set) and incompatible (return error > > >>> when an unknown flag is set) bits. > > >> > > >> This may be a good idea in general, but in the case of Landlock, because > > >> this kind of (discretionary) sandboxing should be a best-effort security > > >> feature, we should avoid incompatible behavior. In practice, every > > >> unknown bit returns an error because userland can probe for available > > >> bits thanks to the get_features command. This kind of (in)compatibility > > >> can then be handled by userland. > > > > > > If there are not going to be incompatible extensions, then just ignore > > > all unknown bits and never return an error but get rid of the user > > > space probing that just complicates the interface. > > > > There was multiple discussions about ABI compatibility, especially > > inspired by open(2) vs. openat2(2), and ignoring flags seems to be a bad > > idea. In the "sandboxer" example, we first probe the supported features > > and then mask unknown bits (i.e. access rights) at run time in userland. > > This strategy is quite straightforward, backward compatible and > > future-proof. > > For behavior changing flags, I agree they should be seen as > incompatible flags (i.e. return an error if an unknown bit is set). > > However, for the flags you pass in in an allowlist, treating them > as compatible (i.e. ignore any unknown flags, allowing everything > you are not forbidding already) seems completely reasonable > to me. Do you foresee user space doing anything other than masking > out the bits that the kernel doesn't know about? If not, then doing > it in the kernel should always be simpler. > > > >> I suggest this syscall signature: > > >> SYSCALL_DEFINE3(landlock_create_ruleset, __u32, options, const struct > > >> landlock_attr_ruleset __user *, ruleset_ptr, size_t, ruleset_size); > > > > > > The other problem here is that indirect variable-size structured arguments > > > are a pain to instrument with things like strace or seccomp, so you > > > should first try to use a fixed argument list, and fall back to a fixed > > > structure if that fails. > > > > I agree that it is not perfect with the current tools but this kind of > > extensible structs are becoming common and well defined (e.g. openat2). > > Moreover there is some work going on for seccomp to support "extensible > > argument" syscalls: https://lwn.net/Articles/822256/ > > openat2() is already more complex than we'd ideally want, I think we > should try hard to make new syscalls simpler than that, following the > rule that any interface should be as simple as possible, but no simpler. Extensible structs are targeted at system calls that are either known to grow a lot of features or we already have prior versions that have accumulated quite a lot of features or that by their nature need to be more complex. openat2() is not really complex per se (At least not yet. It will likely grow quite a bit in the future...). The kernel now has infrastructure since clone3() and later generalized with openat2() and is well-equipped with a consistent api to deal with such syscalls so I don't see how this is really an issue in the first place. Yes, syscalls should be kept as simple as possible but we don't need to lock us into a "structs as arguments" are inherently bad mindset. That will also cause us to end up with crappy syscalls that are awkward to use for userspace. (Second-level pointers is a whole different issue of course.) (Arnd, you should also note that we're giving a talk at kernel summit about new syscall conventions and I'm syncing with Florian who'll be talking about the userspace side and requirements of this.) Christian > > > >>>> +static int syscall_add_rule_path_beneath(const void __user *const attr_ptr, > > >>>> + const size_t attr_size) > > >>>> +{ > > >>>> + struct landlock_attr_path_beneath attr_path_beneath; > > >>>> + struct path path; > > >>>> + struct landlock_ruleset *ruleset; > > >>>> + int err; > > >>> > > >>> Similarly, it looks like this wants to be > > >>> > > >>> SYSCALL_DEFINE3(landlock_add_rule_path_beneath, int, ruleset, int, > > >>> path, __u32, flags) > > >>> > > >>> I don't see any need to extend this in a way that wouldn't already > > >>> be served better by adding another system call. You might argue > > >>> that 'flags' and 'allowed_access' could be separate, with the latter > > >>> being an indirect in/out argument here, like > > >>> > > >>> SYSCALL_DEFINE4(landlock_add_rule_path_beneath, int, ruleset, int, path, > > >>> __u64 *, allowed_acces, __u32, flags) > > >> > > >> To avoid adding a new syscall for each new rule type (e.g. path_beneath, > > >> path_range, net_ipv4_range, etc.), I think it would be better to keep > > >> the attr_ptr/attr_size pattern and to explicitely set a dedicated option > > >> flag to specify the attr type. > > >> > > >> This would look like this: > > >> SYSCALL_DEFINE4(landlock_add_rule, __u32, options, int, ruleset, const > > >> void __user *, rule_ptr, size_t, rule_size); > > >> > > >> The rule_ptr could then point to multiple types like struct > > >> landlock_attr_path_beneath (without the current ruleset_fd field). > > > > > > This again introduces variable-sized structured data. How many different > > > kinds of rule types do you think there will be (most likely, and maybe an > > > upper bound)? > > > > I don't know how many rule types will come, but right now I think it may > > be less than 10. > > Ok, > > > > Could (some of) these be generalized to use the same data structure? > > > > I don't think so, file path and network addresses are an example of very > > different types. > > Clearly the target object is something different, but maybe there is > enough commonality to still make them fit into a more regular form. > > For the file system case, you have an identify for an object > (the file descriptor) and the '__u64 allowed_access'. I would > expect that the 'allowed_access' concept is generic enough that > you can make it a direct argument (32 bit register arg, or pointer > to a __u64). Do you expect others to need something besides > an object identifier and a permission bitmask? Maybe it could > be something like > > SYSCALL_DEFINE4(landlock_add_rule, int, ruleset, __u32, options, > const void __user *, object, const __u64 __user > *, allowed_access, > __u32, flags); > > with a fixed-length 'object' identifier type (file descriptor, > sockaddr_storage, ...) for each option. > > Arnd