On Tue, Jan 12, 2016 at 1:45 AM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: > On Mon, Jan 11, 2016 at 2:39 PM, Konstantin Khlebnikov <koct9i@xxxxxxxxx> wrote: >> On Mon, Jan 11, 2016 at 10:38 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: >>> On Sun, Jan 10, 2016 at 7:48 AM, Konstantin Khlebnikov <koct9i@xxxxxxxxx> wrote: >>>> On Sat, Jan 9, 2016 at 2:27 AM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: >>>>> Normally, when a user can modify a file that has setuid or setgid bits, >>>>> those bits are cleared when they are not the file owner or a member >>>>> of the group. This is enforced when using write and truncate but not >>>>> when writing to a shared mmap on the file. This could allow the file >>>>> writer to gain privileges by changing a binary without losing the >>>>> setuid/setgid/caps bits. >>>>> >>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done >>>>> during the page fault (due to mmap_sem being held during the fault). We >>>>> could do this during vm_mmap_pgoff, but that would need coverage in >>>>> mprotect as well, but to check for MAP_SHARED, we'd need to hold mmap_sem >>>>> again. We could clear at open() time, but it's possible things are >>>>> accidentally opening with O_RDWR and only reading. Better to clear on >>>>> close and error failures (i.e. an improvement over now, which is not >>>>> clearing at all). >>>> >>>> I think this should be done in mmap/mprotect. Code in sys_mmap is trivial. >>>> >>>> In sys_mprotect you can check file_needs_remove_privs() and VM_SHARED >>>> under mmap_sem, then if needed grab reference to struct file from vma and >>>> clear suid after unlocking mmap_sem. >>>> >>>> I haven't seen previous iterations, probably this approach has known flaws. >>> >>> mmap_sem is still needed in mprotect (to find and hold the vma), so >>> it's not possible. I'd love to be proven wrong, but I didn't see a >>> way. >> >> something like this >> >> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len, >> >> vm_flags = calc_vm_prot_bits(prot); >> >> +restart: >> down_write(¤t->mm->mmap_sem); >> >> vma = find_vma(current->mm, start); >> @@ -416,6 +418,21 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, >> size_t, len, >> goto out; >> } >> >> + if ((newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) && >> + vma->vm_file && file_needs_remove_privs(vma->vm_file)) { >> + struct file *file = get_file(vma->vm_file); >> + >> + start = vma->vm_start; >> + up_write(¤t->mm->mmap_sem); >> + mutex_lock(&file_inode(file)->i_mutex); >> + error = file_remove_privs(file); >> + mutex_unlock(&file_inode(file)->i_mutex); >> + fput(file); >> + if (error) >> + return error; >> + goto restart; >> + } >> + > > Is this safe against the things Al mentioned? I still don't like the > mmap/mprotect approach because it makes the change before anything was > actually written... (I forgot to check VM_SHARED) Yep, this should be safe. I think suid should be cleared before any possible change of data. New content could hit the disk but suid never be cleared, for example if system suddenly crashed or rebooted. > > -Kees > >> >> >>> >>> -Kees >>> >>> -- >>> Kees Cook >>> Chrome OS & Brillo Security > > > > -- > Kees Cook > Chrome OS & Brillo Security -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html