On Mon, Jan 11, 2016 at 3:16 PM, Konstantin Khlebnikov <koct9i@xxxxxxxxx> wrote: > On Tue, Jan 12, 2016 at 1:45 AM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: >> On Mon, Jan 11, 2016 at 2:39 PM, Konstantin Khlebnikov <koct9i@xxxxxxxxx> wrote: >>> On Mon, Jan 11, 2016 at 10:38 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: >>>> On Sun, Jan 10, 2016 at 7:48 AM, Konstantin Khlebnikov <koct9i@xxxxxxxxx> wrote: >>>>> On Sat, Jan 9, 2016 at 2:27 AM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: >>>>>> Normally, when a user can modify a file that has setuid or setgid bits, >>>>>> those bits are cleared when they are not the file owner or a member >>>>>> of the group. This is enforced when using write and truncate but not >>>>>> when writing to a shared mmap on the file. This could allow the file >>>>>> writer to gain privileges by changing a binary without losing the >>>>>> setuid/setgid/caps bits. >>>>>> >>>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done >>>>>> during the page fault (due to mmap_sem being held during the fault). We >>>>>> could do this during vm_mmap_pgoff, but that would need coverage in >>>>>> mprotect as well, but to check for MAP_SHARED, we'd need to hold mmap_sem >>>>>> again. We could clear at open() time, but it's possible things are >>>>>> accidentally opening with O_RDWR and only reading. Better to clear on >>>>>> close and error failures (i.e. an improvement over now, which is not >>>>>> clearing at all). >>>>> >>>>> I think this should be done in mmap/mprotect. Code in sys_mmap is trivial. >>>>> >>>>> In sys_mprotect you can check file_needs_remove_privs() and VM_SHARED >>>>> under mmap_sem, then if needed grab reference to struct file from vma and >>>>> clear suid after unlocking mmap_sem. >>>>> >>>>> I haven't seen previous iterations, probably this approach has known flaws. >>>> >>>> mmap_sem is still needed in mprotect (to find and hold the vma), so >>>> it's not possible. I'd love to be proven wrong, but I didn't see a >>>> way. >>> >>> something like this >>> >>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len, >>> >>> vm_flags = calc_vm_prot_bits(prot); >>> >>> +restart: >>> down_write(¤t->mm->mmap_sem); >>> >>> vma = find_vma(current->mm, start); >>> @@ -416,6 +418,21 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, >>> size_t, len, >>> goto out; >>> } >>> >>> + if ((newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) && >>> + vma->vm_file && file_needs_remove_privs(vma->vm_file)) { >>> + struct file *file = get_file(vma->vm_file); >>> + >>> + start = vma->vm_start; >>> + up_write(¤t->mm->mmap_sem); >>> + mutex_lock(&file_inode(file)->i_mutex); >>> + error = file_remove_privs(file); >>> + mutex_unlock(&file_inode(file)->i_mutex); >>> + fput(file); >>> + if (error) >>> + return error; >>> + goto restart; >>> + } >>> + >> >> Is this safe against the things Al mentioned? I still don't like the >> mmap/mprotect approach because it makes the change before anything was >> actually written... > > (I forgot to check VM_SHARED) > > Yep, this should be safe. > > I think suid should be cleared before any possible change of data. > New content could hit the disk but suid never be cleared, > for example if system suddenly crashed or rebooted. Oooh, very good point. Yeah, that's enough to convince me. :) Ignore my v7... Al, are you okay with this semantic change? -Kees -- Kees Cook Chrome OS & Brillo Security -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html