On 12/6/23 12:59, Alice Ryhl wrote: > + /// Schedule a task work that closes the file descriptor when this task returns to userspace. > + /// > + /// Fails if this is called from a context where we cannot run work when returning to > + /// userspace. (E.g., from a kthread.) > + pub fn close_fd(self, fd: u32) -> Result<(), DeferredFdCloseError> { > + use bindings::task_work_notify_mode_TWA_RESUME as TWA_RESUME; > + > + // In this method, we schedule the task work before closing the file. This is because > + // scheduling a task work is fallible, and we need to know whether it will fail before we > + // attempt to close the file. > + > + // SAFETY: Getting a pointer to current is always safe. > + let current = unsafe { bindings::get_current() }; > + > + // SAFETY: Accessing the `flags` field of `current` is always safe. > + let is_kthread = (unsafe { (*current).flags } & bindings::PF_KTHREAD) != 0; Since Boqun brought to my attention that we already have a wrapper for `get_current()`, how about you use it here as well? > + if is_kthread { > + return Err(DeferredFdCloseError::TaskWorkUnavailable); > + } > + > + // This disables the destructor of the box, so the allocation is not cleaned up > + // automatically below. > + let inner = Box::into_raw(self.inner); Importantly this also lifts the uniqueness requirement (maybe add this to the comment?). > + > + // The `callback_head` field is first in the struct, so this cast correctly gives us a > + // pointer to the field. > + let callback_head = inner.cast::<bindings::callback_head>(); > + // SAFETY: This pointer offset operation does not go out-of-bounds. > + let file_field = unsafe { core::ptr::addr_of_mut!((*inner).file) }; > + > + // SAFETY: The `callback_head` pointer is compatible with the `do_close_fd` method. Also, `callback_head` is valid, since it is derived from... > + unsafe { bindings::init_task_work(callback_head, Some(Self::do_close_fd)) }; > + // SAFETY: The `callback_head` pointer points at a valid and fully initialized task work > + // that is ready to be scheduled. > + // > + // If the task work gets scheduled as-is, then it will be a no-op. However, we will update > + // the file pointer below to tell it which file to fput. > + let res = unsafe { bindings::task_work_add(current, callback_head, TWA_RESUME) }; > + > + if res != 0 { > + // SAFETY: Scheduling the task work failed, so we still have ownership of the box, so > + // we may destroy it. > + unsafe { drop(Box::from_raw(inner)) }; > + > + return Err(DeferredFdCloseError::TaskWorkUnavailable); Just curious, what could make the `task_work_add` call fail? I imagine an OOM situation, but is that all? > + } > + > + // SAFETY: Just an FFI call. This is safe no matter what `fd` is. I took a look at the C code and there I found this comment: /* * variant of close_fd that gets a ref on the file for later fput. * The caller must ensure that filp_close() called on the file. */ And while you do call `filp_close` later, this seems like a safety requirement to me. Also, you do not call it when `file` is null, which I imagine to be fine, but I do not know that since the C comment does not cover that case. > + let file = unsafe { bindings::close_fd_get_file(fd) }; > + if file.is_null() { > + // We don't clean up the task work since that might be expensive if the task work queue > + // is long. Just let it execute and let it clean up for itself. > + return Err(DeferredFdCloseError::BadFd); > + } > + > + // SAFETY: The `file` pointer points at a valid file. > + unsafe { bindings::get_file(file) }; > + > + // SAFETY: Due to the above `get_file`, even if the current task holds an `fdget` to > + // this file right now, the refcount will not drop to zero until after it is released > + // with `fdput`. This is because when using `fdget`, you must always use `fdput` before Shouldn't this be "the refcount will not drop to zero until after it is released with `fput`."? Why is this the SAFETY comment for `filp_close`? I am not understanding the requirement on that function that needs this. This seems more a justification for accessing `file` inside `do_close_fd`. In which case I think it would be better to make it a type invariant of `DeferredFdCloserInner`. > + // returning to userspace, and our task work runs after any `fdget` users have returned > + // to userspace. > + // > + // Note: fl_owner_t is currently a void pointer. > + unsafe { bindings::filp_close(file, (*current).files as bindings::fl_owner_t) }; > + > + // We update the file pointer that the task work is supposed to fput. > + // > + // SAFETY: Task works are executed on the current thread once we return to userspace, so > + // this write is guaranteed to happen before `do_close_fd` is called, which means that a > + // race is not possible here. > + // > + // It's okay to pass this pointer to the task work, since we just acquired a refcount with > + // the previous call to `get_file`. Furthermore, the refcount will not drop to zero during > + // an `fdget` call, since we defer the `fput` until after returning to userspace. > + unsafe { *file_field = file }; A synchronization question: who guarantees that this write is actually available to the cpu that executes `do_close_fd`? Is there some synchronization run when returning to userspace? > + > + Ok(()) > + } > + > + // SAFETY: This function is an implementation detail of `close_fd`, so its safety comments > + // should be read in extension of that method. Why not use this?: - `inner` is a valid pointer to the `callback_head` field of a valid `DeferredFdCloserInner`. - `inner` has exclusive access to the pointee and owns the allocation. - `inner` originates from a call to `Box::into_raw`. > + unsafe extern "C" fn do_close_fd(inner: *mut bindings::callback_head) { > + // SAFETY: In `close_fd` we use this method together with a pointer that originates from a > + // `Box<DeferredFdCloserInner>`, and we have just been given ownership of that allocation. > + let inner = unsafe { Box::from_raw(inner as *mut DeferredFdCloserInner) }; Use `.cast()`. > + if !inner.file.is_null() { > + // SAFETY: This drops a refcount we acquired in `close_fd`. Since this callback runs in > + // a task work after we return to userspace, it is guaranteed that the current thread > + // doesn't hold this file with `fdget`, as `fdget` must be released before returning to > + // userspace. > + unsafe { bindings::fput(inner.file) }; > + } > + // Free the allocation. > + drop(inner); > + } > +} > + > +/// Represents a failure to close an fd in a deferred manner. > +#[derive(Copy, Clone, Eq, PartialEq)] > +pub enum DeferredFdCloseError { > + /// Closing the fd failed because we were unable to schedule a task work. > + TaskWorkUnavailable, > + /// Closing the fd failed because the fd does not exist. > + BadFd, > +} > + > +impl From<DeferredFdCloseError> for Error { > + fn from(err: DeferredFdCloseError) -> Error { > + match err { > + DeferredFdCloseError::TaskWorkUnavailable => ESRCH, This error reads "No such process", I am not sure if that is the best way to express the problem in that situation. I took a quick look at the other error codes, but could not find a better fit. Do you have any better ideas? Or is this the error that C binder uses? -- Cheers, Benno > + DeferredFdCloseError::BadFd => EBADF, > + } > + } > +}