On Thu, Apr 9, 2020 at 12:01 AM Alexander Popov <alex.popov@xxxxxxxxx> wrote: > CVE-2019-18683 refers to three similar vulnerabilities caused by the same > incorrect approach to locking that is used in vivid_stop_generating_vid_cap(), > vivid_stop_generating_vid_out(), and sdr_cap_stop_streaming(). > > For fixes please see the commit 6dcd5d7a7a29c1e4 (media: vivid: Fix wrong > locking that causes race conditions on streaming stop). > > These three functions are called during streaming stopping with vivid_dev.mutex > locked. And they all do the same mistake while stopping their kthreads, which > need to lock this mutex as well. See the example from > vivid_stop_generating_vid_cap(): > /* shutdown control thread */ > vivid_grab_controls(dev, false); > mutex_unlock(&dev->mutex); > kthread_stop(dev->kthread_vid_cap); > dev->kthread_vid_cap = NULL; > mutex_lock(&dev->mutex); > > But when this mutex is unlocked, another vb2_fop_read() can lock it instead of > the kthread and manipulate the buffer queue. That causes use-after-free. > > I created a Coccinelle rule that detects mutex_unlock+kthread_stop+mutex_lock > within one function. [...] > mutex_unlock@unlock_p(E) > ... > kthread_stop@stop_p(...) > ... > mutex_lock@lock_p(E) Is the kthread_stop() really special here? It seems to me like it's pretty much just a normal instance of the "temporarily dropping a lock" pattern - which does tend to go wrong quite often, but can also be correct. I think it would be interesting though to have a list of places that drop and then re-acquire a mutex/spinlock/... that was not originally acquired in the same block of code (but was instead originally acquired in an outer block, or by a parent function, or something like that). So things like this: void X(...) { mutex_lock(A); for (...) { ... mutex_unlock(A); ... mutex_lock(A); ... } mutex_unlock(A); } or like this: void X(...) { ... [no mutex operations on A] mutex_unlock(A); ... mutex_lock(A); ... } But of course, there are places where this kind of behavior is correct; so such a script wouldn't just return report code, just code that could use a bit more scrutiny than normal. For example, in madvise_remove(), the mmap_sem is dropped and then re-acquired, which is fine because the caller deals with that possibility properly: static long madvise_remove(struct vm_area_struct *vma, struct vm_area_struct **prev, unsigned long start, unsigned long end) { loff_t offset; int error; struct file *f; *prev = NULL; /* tell sys_madvise we drop mmap_sem */ if (vma->vm_flags & VM_LOCKED) return -EINVAL; f = vma->vm_file; if (!f || !f->f_mapping || !f->f_mapping->host) { return -EINVAL; } if ((vma->vm_flags & (VM_SHARED|VM_WRITE)) != (VM_SHARED|VM_WRITE)) return -EACCES; offset = (loff_t)(start - vma->vm_start) + ((loff_t)vma->vm_pgoff << PAGE_SHIFT); /* * Filesystem's fallocate may need to take i_mutex. We need to * explicitly grab a reference because the vma (and hence the * vma's reference to the file) can go away as soon as we drop * mmap_sem. */ get_file(f); if (userfaultfd_remove(vma, start, end)) { /* mmap_sem was not released by userfaultfd_remove() */ up_read(¤t->mm->mmap_sem); } error = vfs_fallocate(f, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, offset, end - start); fput(f); down_read(¤t->mm->mmap_sem); return error; }