On Fri, 3 May 2024 11:02:57 +0200 Christian Brauner <brauner@xxxxxxxxxx> > On Thu, May 02, 2024 at 04:03:24PM -0700, Kees Cook wrote: > > On Fri, May 03, 2024 at 12:53:56AM +0200, Jann Horn wrote: > > > On Fri, May 3, 2024 at 12:34 AM Kees Cook <keescook@xxxxxxxxxxxx> wrote: > > > > If f_count reaches 0, calling get_file() should be a failure. Adjust to > > > > use atomic_long_inc_not_zero() and return NULL on failure. In the future > > > > get_file() can be annotated with __must_check, though that is not > > > > currently possible. > > > [...] > > > > static inline struct file *get_file(struct file *f) > > > > { > > > > - atomic_long_inc(&f->f_count); > > > > + if (unlikely(!atomic_long_inc_not_zero(&f->f_count))) > > > > + return NULL; > > > > return f; > > > > } > > > > > > Oh, I really don't like this... > > > > > > In most code, if you call get_file() on a file and see refcount zero, > > > that basically means you're in a UAF write situation, or that you > > > could be in such a situation if you had raced differently. It's > > > basically just like refcount_inc() in that regard. > > > > Shouldn't the system attempt to not make things worse if it encounters > > an inc-from-0 condition? Yes, we've already lost the race for a UaF > > condition, but maybe don't continue on. > > > > > And get_file() has semantics just like refcount_inc(): The caller > > > guarantees that it is already holding a reference to the file; and if > > > > Yes, but if that guarantee is violated, we should do something about it. > > > > > the caller is wrong about that, their subsequent attempt to clean up > > > the reference that they think they were already holding will likely > > > lead to UAF too. If get_file() sees a zero refcount, there is no safe > > > way to continue. And all existing callers of get_file() expect the > > > return value to be the same as the non-NULL pointer they passed in, so > > > they'll either ignore the result of this check and barrel on, or oops > > > with a NULL deref. > > > > > > For callers that want to actually try incrementing file refcounts that > > > could be zero, which is only possible under specific circumstances, we > > > have helpers like get_file_rcu() and get_file_active(). > > > > So what's going on in here: > > https://lore.kernel.org/linux-hardening/20240502223341.1835070-2-keescook@xxxxxxxxxxxx/ > > Afaict, there's dma_buf_export() that allocates a new file and sets: > > file->private_data = dmabuf; > dmabuf->file = file; > > The file has f_op->release::dma_buf_file_release() as it's f_op->release > method. When that's called the file's refcount is already zero but the > file has not been freed yet. This will remove the dmabuf from some > public list but it won't free it. > > Then we see that any dentry allocated for such a dmabuf file will have > dma_buf_dentry_ops which in turn has > dentry->d_release::dma_buf_release() which is where the actual release > of the dma buffer happens taken from dentry->d_fsdata. > > That whole thing calls allocate_file_pseudo() which allocates a new > dentry specific to that struct file. That dentry is unhashed (no lookup) > and thus isn't retained so when dput() is called and it's the last > reference it's immediately followed by > dentry->d_release::dma_buf_release() which wipes the dmabuf itself. > > The lifetime of the dmabuf is managed via fget()/fput(). So the lifetime > of the dmabuf and the lifetime of the file are almost identical afaict: > > __fput() > -> f_op->release::dma_buf_file_release() // handles file specific freeing > -> dput() > -> d_op->d_release::dma_buf_release() // handles dmabuf freeing > // including the driver specific stuff. > > If you fput() the file then the dmabuf will be freed as well immediately > after it when the dput() happens in __fput() (I struggle to come up with > an explanation why the freeing of the dmabuf is moved to > dentry->d_release instead of f_op->release itself but that's a separate > matter.). > > So on the face of it without looking a little closer > > static bool __must_check get_dma_buf_unless_doomed(struct dma_buf *dmabuf) > { > return atomic_long_inc_not_zero(&dmabuf->file->f_count) != 0L; > } > > looks wrong or broken. Because if dmabuf->file->f_count is 0 it implies > that @dmabuf should have already been freed. So the bug would be in > accessing @dmabuf. And if @dmabuf is valid then it automatically means > that dmabuf->file->f_count isn't 0. So it looks like it could just use > get_file(). > > _But_ the interesting bits are in ttm_object_device_init(). This steals > the dma_buf_ops into tdev->ops. It then takes dma_buf_ops->release and > stores it away into tdev->dma_buf_release. Then it overrides the > dma_buf_ops->release with ttm_prime_dmabuf_release(). And that's where > the very questionable magic happens. > > So now let's say the dmabuf is freed because of lat fput(). We now get > f_op->release::dma_buf_file_release(). Then it's followed by dput() and > ultimately dentry->d_release::dma_buf_release() as mentioned above. > > But now when we get: > > dentry->d_release::dma_buf_release() > -> dmabuf->ops->release::ttm_prime_dmabuf_release() > > instead of the original dmabuf->ops->release method that was stolen into > tdev->dmabuf_release. And ttm_prime_dmabuf_release() now calls > tdev->dma_buf_release() which just frees the data associated with the > dmabuf not the dmabuf itself. > > ttm_prime_dmabuf_release() then takes prime->mutex_lock replacing > prime->dma_buf with NULL. > > The same lock is taken in ttm_prime_handle_to_fd() which is picking that > dmabuf from prime->dmabuf. So the interesting case is when > ttm_prime_dma_buf_release() has called tdev->dmabuf_release() and but > someone else maanged to grab prime->mutex_lock before > ttm_prime_dma_buf_release() could grab it to NULL prime->dma_buf. > > So at that point @dmabuf hasn't been freed yet and is still valid. So > dereferencing prime->dma_buf is still valid and by extension > dma_buf->file as their lifetimes are tied. > > IOW, that should just use get_file_active() which handles that just > fine. > > And while that get_dma_buf_unless_doomed() thing is safe that whole code > reeks of a level of complexity that's asking for trouble. > > But that has zero to do with get_file() and it is absolutely not a > reason to mess with it's semantics impacting every caller in the tree. > > > > > > Can't you throw a CHECK_DATA_CORRUPTION() or something like that in > > > there instead? > > > > I'm open to suggestions, but given what's happening with struct dma_buf > > above, it seems like this is a state worth checking for? > > No, it's really not. If you use get_file() you better know that you're > already holding a valid reference that's no reason to make it suddenly > fail. > But the simple question is, why are you asking dma_buf to poll a 0 f_count file? All fine if you are not.