On Tue, Oct 9, 2018 at 5:36 PM Aleksa Sarai <asarai@xxxxxxx> wrote: > On 2018-10-09, 'Jann Horn' via dev <jannh@xxxxxxxxxx> wrote: > > On Tue, Oct 9, 2018 at 9:03 AM Aleksa Sarai <cyphar@xxxxxxxxxx> wrote: > > > This patch allows for AT_BENEATH and AT_THIS_ROOT to safely permit ".." > > > resolution (in the case of AT_BENEATH the resolution will still fail if > > > ".." resolution would resolve a path outside of the root -- while > > > AT_THIS_ROOT will chroot(2)-style scope it). "proclink" jumps are still > > > disallowed entirely because now they could result in inconsistent > > > behaviour if resolution encounters a subsequent "..". > > > > > > The need for this patch is explained by observing there is a fairly > > > easy-to-exploit race condition with chroot(2) (and thus by extension > > > AT_THIS_ROOT and AT_BENEATH) where a rename(2) of a path can be used to > > > "skip over" nd->root and thus escape to the filesystem above nd->root. > > > > > > thread1 [attacker]: > > > for (;;) > > > renameat2(AT_FDCWD, "/a/b/c", AT_FDCWD, "/a/d", RENAME_EXCHANGE); > > > thread2 [victim]: > > > for (;;) > > > openat(dirb, "b/c/../../etc/shadow", O_THISROOT); > > > > > > With fairly significant regularity, thread2 will resolve to > > > "/etc/shadow" rather than "/a/b/etc/shadow". There is also a similar > > > (though somewhat more privileged) attack using MS_MOVE. > > > > > > With this patch, such cases will be detected *during* ".." resolution > > > (which is the weak point of chroot(2) -- since walking *into* a > > > subdirectory tautologically cannot result in you walking *outside* > > > nd->root -- except through a bind-mount or "proclink"). By detecting > > > this at ".." resolution (rather than checking only at the end of the > > > entire resolution) we can both correct escapes by jumping back to the > > > root (in the case of AT_THIS_ROOT), as well as avoid revealing to > > > attackers the structure of the filesystem outside of the root (through > > > timing attacks for instance). > > > > > > In order to avoid a quadratic lookup with each ".." entry, we only > > > activate the slow path if a write through &rename_lock or &mount_lock > > > have occurred during path resolution (&rename_lock and &mount_lock are > > > re-taken to further optimise the lookup). Since the primary attack being > > > protected against is MS_MOVE or rename(2), not doing additional checks > > > unless a mount or rename have occurred avoids making the common case > > > slow. > > > > > > The use of __d_path here might seem suspect, but on further inspection > > > of the most important race (a path was *inside* the root but is now > > > *outside*), there appears to be no attack potential. If __d_path occurs > > > before the rename, then the path will be resolved but since the path was > > > originally inside the root there is no escape. Subsequent ".." jumps are > > > guaranteed to check __d_path reachable (by construction, &rename_lock or > > > &mount_lock must have been taken after __d_path returned), > > > > "after"? Don't you mean "before"? Otherwise I don't understand what > > you're saying here. > > I meant that the attacker doing the rename must've taken &rename_lock > or &mount_lock after __d_path returns in the target process (because the > race being examined is that the rename occurs *after* __d_path) and thus > are guaranteed to be detected). > > Maybe there's a better way to phrase what I mean... Aah, I thought you were referring to what the victim process is doing, not what the racing attacker is doing.