On Wed, Mar 18, 2020 at 1:06 PM Mickaël Salaün <mic@xxxxxxxxxxx> wrote: > On 17/03/2020 20:45, Jann Horn wrote: > > On Tue, Mar 17, 2020 at 6:50 PM Mickaël Salaün <mic@xxxxxxxxxxx> wrote: > >> On 17/03/2020 17:19, Jann Horn wrote: > >>> On Thu, Mar 12, 2020 at 12:38 AM Mickaël Salaün <mic@xxxxxxxxxxx> wrote: > >>>> On 10/03/2020 00:44, Jann Horn wrote: > >>>>> On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün <mic@xxxxxxxxxxx> wrote: > >> > >> [...] > >> > >>>>> Aside from those things, there is also a major correctness issue where > >>>>> I'm not sure how to solve it properly: > >>>>> > >>>>> Let's say a process installs a filter on itself like this: > >>>>> > >>>>> struct landlock_attr_ruleset ruleset = { .handled_access_fs = > >>>>> ACCESS_FS_ROUGHLY_WRITE}; > >>>>> int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, > >>>>> LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); > >>>>> struct landlock_attr_path_beneath path_beneath = { > >>>>> .ruleset_fd = ruleset_fd, > >>>>> .allowed_access = ACCESS_FS_ROUGHLY_WRITE, > >>>>> .parent_fd = open("/tmp/foobar", O_PATH), > >>>>> }; > >>>>> landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, > >>>>> sizeof(path_beneath), &path_beneath); > >>>>> prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); > >>>>> struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd }; > >>>>> landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, > >>>>> sizeof(attr_enforce), &attr_enforce); > >>>>> > >>>>> At this point, the process is not supposed to be able to write to > >>>>> anything outside /tmp/foobar, right? But what happens if the process > >>>>> does the following next? > >>>>> > >>>>> struct landlock_attr_ruleset ruleset = { .handled_access_fs = > >>>>> ACCESS_FS_ROUGHLY_WRITE}; > >>>>> int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, > >>>>> LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); > >>>>> struct landlock_attr_path_beneath path_beneath = { > >>>>> .ruleset_fd = ruleset_fd, > >>>>> .allowed_access = ACCESS_FS_ROUGHLY_WRITE, > >>>>> .parent_fd = open("/", O_PATH), > >>>>> }; > >>>>> landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, > >>>>> sizeof(path_beneath), &path_beneath); > >>>>> prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); > >>>>> struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd }; > >>>>> landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, > >>>>> sizeof(attr_enforce), &attr_enforce); > >>>>> > >>>>> As far as I can tell from looking at the source, after this, you will > >>>>> have write access to the entire filesystem again. I think the idea is > >>>>> that LANDLOCK_CMD_ENFORCE_RULESET should only let you drop privileges, > >>>>> not increase them, right? > >>>> > >>>> There is an additionnal check in syscall.c:get_path_from_fd(): it is > >>>> forbidden to add a rule with a path which is not accessible (according > >>>> to LANDLOCK_ACCESS_FS_OPEN) thanks to a call to security_file_open(), > >>>> but this is definitely not perfect. > >>> > >>> Ah, I missed that. > >>> > >>>>> I think the easy way to fix this would be to add a bitmask to each > >>>>> rule that says from which ruleset it originally comes, and then let > >>>>> check_access_path() collect these bitmasks from each rule with OR, and > >>>>> check at the end whether the resulting bitmask is full - if not, at > >>>>> least one of the rulesets did not permit the access, and it should be > >>>>> denied. > >>>>> > >>>>> But maybe it would make more sense to change how the API works > >>>>> instead, and get rid of the concept of "merging" two rulesets > >>>>> together? Instead, we could make the API work like this: > >>>>> > >>>>> - LANDLOCK_CMD_CREATE_RULESET gives you a file descriptor whose > >>>>> ->private_data contains a pointer to the old ruleset of the process, > >>>>> as well as a pointer to a new empty ruleset. > >>>>> - LANDLOCK_CMD_ADD_RULE fails if the specified rule would not be > >>>>> permitted by the old ruleset, then adds the rule to the new ruleset > >>>>> - LANDLOCK_CMD_ENFORCE_RULESET fails if the old ruleset pointer in > >>>>> ->private_data doesn't match the current ruleset of the process, then > >>>>> replaces the old ruleset with the new ruleset. > >>>>> > >>>>> With this, the new ruleset is guaranteed to be a subset of the old > >>>>> ruleset because each of the new ruleset's rules is permitted by the > >>>>> old ruleset. (Unless the directory hierarchy rotates, but in that case > >>>>> the inaccuracy isn't much worse than what would've been possible > >>>>> through RCU path walk anyway AFAIK.) > >>>>> > >>>>> What do you think? > >>>>> > >>>> > >>>> I would prefer to add the same checks you described at first (with > >>>> check_access_path), but only when creating a new ruleset with > >>>> merge_ruleset() (which should probably be renamed). This enables not to > >>>> rely on a parent ruleset/domain until the enforcement, which is the case > >>>> anyway. > >>>> Unfortunately this doesn't work for some cases with bind mounts. Because > >>>> check_access_path() goes through one path, another (bind mounted) path > >>>> could be illegitimately allowed. > >>> > >>> Hmm... I'm not sure what you mean. At the moment, landlock doesn't > >>> allow any sandboxed process to change the mount hierarchy, right? Can > >>> you give an example where this would go wrong? > >> > >> Indeed, a Landlocked process must no be able to change its mount > >> namespace layout. However, bind mounts may already exist. > >> Let's say a process sandbox itself to only access /a in a read-write > >> way. > > > > So, first policy: > > > > /a RW > > > >> Then, this process (or one of its children) add a new restriction > >> on /a/b to only be able to read this hierarchy. > > > > You mean with the second policy looking like this? > > Right. > > > > > /a RW > > /a/b R > > > > Then the resulting policy would be: > > > > /a RW policy_bitmask=0x00000003 (bits 0 and 1 set) > > /a/b R policy_bitmask=0x00000002 (bit 1 set) > > required_bits=0x00000003 (bits 0 and 1 set) > > > >> The check at insertion > >> time would allow this because this access right is a subset of the > >> access right allowed with the parent directory. However, If /a/b is bind > >> mounted somewhere else, let's say in /private/b, then the second > >> enforcement just gave new access rights to this hierarchy too. > > > > But with the solution I proposed, landlock's path walk would see > > something like this when accessing a file at /private/b/foo: > > /private/b/foo <no rules> > > policies seen until now: 0x00000000 > > /private/b <access: R, policy_bitmask=0x00000002> > > policies seen until now: 0x00000002 > > /private <no rules> > > policies seen until now: 0x00000002 > > / <no rules> > > policies seen until now: 0x00000002 > > > > It wouldn't encounter any rule from the first policy, so the OR of the > > seen policy bitmasks would be 0x00000002, which is not the required > > value 0x00000003, and so the access would be denied. > As I understand your proposition, we need to build the required_bits > when adding a rule or enforcing/merging a ruleset with a domain. The > issue is that a rule only refers to a struct inode, not a struct path. > For your proposition to work, we would need to walk through the file > path when adding a rule to a ruleset, which means that we need to depend > of the current view of the process (i.e. its mount namespace), and its > Landlock domain. I don't see why that is necessary. Why would we have to walk the file path when adding a rule? > If the required_bits field is set when the ruleset is > merged with the domain, it is not possible anymore to walk through the > corresponding initial file path, which makes the enforcement step too > late to check for such consistency. The important point is that a > ruleset/domain doesn't have a notion of file hierarchy, a ruleset is > only a set of tagged inodes. > > I'm not sure I got your proposition right, though. When and how would > you generate the required_bits? Using your terminology: A domain is a collection of N layers, which are assigned indices 0..N-1. For each possible access type, a domain has a bitmask containing N bits that stores which layers control that access type. (Basically a per-layer version of fs_access_mask.) To validate an access, you start by ORing together the bitmasks for the requested access types; that gives you the required_bits mask, which lists all layers that want to control the access. Then you set seen_policy_bits=0, then do the check_access_path_continue() loop while keeping track of which layers you've seen with "seen_policy_bits |= access->contributing_policies", or something like that. And in the end, you check that seen_policy_bits is a superset of required_bits - something like `(~seen_policy_bits) & required_bits == 0`. AFAICS to create a new domain from a bunch of layers, you wouldn't have to do any path walking. > Here is my updated proposition: add a layer level and a depth to each > rule (once enforced/merged with a domain), and a top layer level for a > domain. When enforcing a ruleset (i.e. merging a ruleset into the > current domain), the layer level of a new rule would be the incremented > top layer level. > If there is no rule (from this domain) tied to the same > inode, then the depth of the new rule is 1. However, if there is already > a rule tied to the same inode and if this rule's layer level is the > previous top layer level, then the depth and the layer level are both > incremented and the rule is updated with the new access rights (boolean > AND). > > The policy looks like this: > domain top_layer=2 > /a RW policy_bitmask=0x00000003 layer=1 depth=1 > /a/b R policy_bitmask=0x00000002 layer=2 depth=1 > > The path walk access check walks through all inodes and start with a > layer counter equal to the top layer of the current domain. For each > encountered inode tied to a rule, the access rights are checked and a > new check ensures that the layer of the matching rule is the same as the > counter (this may be a merged ruleset containing rules pertaining to the > same hierarchy, which is fine) or equal to the decremented counter (i.e. > the path walk just reached the underlying layer). If the path walk > encounter a rule with a layer strictly less than the counter minus one, > there is a whole in the layers which means that the ruleset > hierarchy/subset does not match, and the access must be denied. > > When accessing a file at /private/b/foo for a read access: > /private/b/foo <no rules> > allowed_access=unknown layer_counter=2 > /private/b <access: R, policy_bitmask=0x00000002, layer=2, depth=1> > allowed_access=allowed layer_counter=2 > /private <no rules> > allowed_access=allowed layer_counter=2 > / <no rules> > allowed_access=allowed layer_counter=2 > > Because the layer_counter didn't reach 1, the access request is then denied. > > This proposition enables not to rely on a parent ruleset at first, only > when enforcing/merging a ruleset with a domain. This also solves the > issue with multiple inherited/nested rules on the same inode (in which > case the depth just grows). Moreover, this enables to safely stop the > path walk as soon as we reach the layer 1. (FWIW, you could do the same optimization with the seen_policy_bits approach.) I guess the difference between your proposal and mine is that in my proposal, the following would work, in effect permitting W access to /foo/bar/baz (and nothing else)? first ruleset: /foo W second ruleset: /foo/bar/baz W third ruleset: /foo/bar W whereas in your proposal, IIUC it wouldn't be valid for a new ruleset to whitelist a superset of what was whitelisted in a previous ruleset?