On Wed, Jan 13, 2016 at 12:23 PM, Konstantin Khlebnikov <koct9i@xxxxxxxxx> wrote: > On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: >> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@xxxxxxx> wrote: >>> On Tue 12-01-16 11:09:04, Kees Cook wrote: >>>> Normally, when a user can modify a file that has setuid or setgid bits, >>>> those bits are cleared when they are not the file owner or a member >>>> of the group. This is enforced when using write and truncate but not >>>> when writing to a shared mmap on the file. This could allow the file >>>> writer to gain privileges by changing a binary without losing the >>>> setuid/setgid/caps bits. >>>> >>>> Changing the bits requires holding inode->i_mutex, so it cannot be done >>>> during the page fault (due to mmap_sem being held during the fault). >>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time, >>>> or added at mprotect time. >>>> >>>> Since we can't do the check in the right place inside mmap (due to >>>> holding mmap_sem), we have to do it before holding mmap_sem, which >>>> means duplicating some checks, which have to be available to the non-MMU >>>> builds too. >>>> >>>> When walking VMAs during mprotect, we need to drop mmap_sem (while >>>> holding a file reference) and restart the walk after clearing privileges. >>> >>> ... >>> >>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len, >>>> >>>> vm_flags = calc_vm_prot_bits(prot); >>>> >>>> +restart: >>>> down_write(¤t->mm->mmap_sem); >>>> >>>> vma = find_vma(current->mm, start); >>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len, >>>> goto out; >>>> } >>>> >>>> + /* >>>> + * If we're adding write permissions to a shared file, >>>> + * we must clear privileges (like done at mmap time), >>>> + * but we have to juggle the locks to avoid holding >>>> + * mmap_sem while holding i_mutex. >>>> + */ >>>> + if ((vma->vm_flags & VM_SHARED) && vma->vm_file && >>>> + (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) && >>>> + !IS_NOSEC(file_inode(vma->vm_file))) { >>> >>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs() >>> is called. However that is not true for two reasons: >>> >>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set >>> IS_NOSEC. >>> >>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is >>> never true. >>> >>> So in these cases you'll loop forever. >> >> UUuugh. >> >>> >>> You can check SUID bits without i_mutex so that could be done without >>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv() >>> without i_mutex as that checks extended attributes (IMA) and that needs >>> i_mutex to be held to avoid races with someone else changing the attributes >>> under you. >> >> Yeah, that's why I changed this from Konstantin's original suggestion. >> >>> Honestly, I don't see a way of implementing this in mprotect() which would >>> be reasonably elegant. >> >> Konstantin, any thoughts here? > > Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it. > If somebody changes xattrs under us we'll end up in race anyway. > But this still safe: setxattrs are sychronized. So I can swap my IS_NOSEC for your original file_needs_remove_privs()? Are the LSM hooks expecting to be called under mm_sem? (Looks like only common_caps implements that, though.) -Kees -- Kees Cook Chrome OS & Brillo Security -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>