On Wed, Oct 02, 2024 at 12:00:45PM -0600, Jens Axboe wrote: > On 10/1/24 8:08 PM, Al Viro wrote: > > On Tue, Oct 01, 2024 at 07:34:12PM -0600, Jens Axboe wrote: > > > >>> -retry: > >>> - ret = filename_lookup(AT_FDCWD, ix->filename, lookup_flags, &path, NULL); > >>> - if (!ret) { > >>> - ret = __io_setxattr(req, issue_flags, &path); > >>> - path_put(&path); > >>> - if (retry_estale(ret, lookup_flags)) { > >>> - lookup_flags |= LOOKUP_REVAL; > >>> - goto retry; > >>> - } > >>> - } > >>> - > >>> + ret = filename_setxattr(AT_FDCWD, ix->filename, LOOKUP_FOLLOW, &ix->ctx); > >>> io_xattr_finish(req, ret); > >>> return IOU_OK; > >> > >> this looks like it needs an ix->filename = NULL, as > >> filename_{s,g}xattr() drops the reference. The previous internal helper > >> did not, and hence the cleanup always did it. But should work fine if > >> ->filename is just zeroed. > >> > >> Otherwise looks good. I've skimmed the other patches and didn't see > >> anything odd, I'll take a closer look tomorrow. > > > > Hmm... I wonder if we would be better off with file{,name}_setxattr() > > doing kvfree(cxt->kvalue) - it makes things easier both on the syscall > > and on io_uring side. > > > > I've added minimal fixes (zeroing ix->filename after filename_[sg]etxattr()) > > to 5/9 and 6/9 *and* added a followup calling conventions change at the end > > of the branch. See #work.xattr2 in the same tree; FWIW, the followup > > cleanup is below; note that putname(ERR_PTR(-Ewhatever)) is an explicit > > no-op, so there's no need to zero on getname() failures. > > Looks good to me, thanks Al! I'm still not sure if the calling conventions change is right - in the current form the last commit in there leaks ctx.kvalue in -EBADF case. It's easy to fix up, but... as far as I'm concerned, a large part of the point of the exercise is to come up with the right model for the calling conventions for that family of APIs. I really want to get rid of that ad-hoc crap. If we are to have what amounts to the alternative syscall interface, we'd better get it right. I'm perfectly fine with having a set of "this is what the syscall is doing past marshalling arguments" primitives, but let's make sure they are properly documented and do not have landmines for callers to step into... Questions on the io_uring side: * you usually reject REQ_F_FIXED_FILE for ...at() at ->prep() time. Fine, but... what's the point of doing that in IORING_OP_FGETXATTR case? Or IORING_OP_GETXATTR, for that matter, since you pass AT_FDCWD anyway... Am I missing something subtle here? * what's to guarantee that pointers fetched by io_file_get_fixed() called from io_assing_file() will stay valid? You do not bump the struct file refcount in this case, after all; what's to prevent unregistration from the main thread while the worker is getting through your request? Is that what the break on node->refs in the loop in io_rsrc_node_ref_zero() is about? Or am I barking at the wrong tree here? I realize that I'm about the last person to complain about the lack of documentation, but... FWIW, my impression is that you have a list of nodes corresponding to overall resource states (which includes the file reference table) and have each borrow bump the use count on the node corresponding to the current state (at the tail of the list?) Each removal adds new node to the tail of the list, sticks the file reference there and tries to trigger io_rsrc_node_ref_zero() (which, for some reason, takes node instead of the node->ctx, even though it doesn't give a rat's arse about anything else in its argument). If there are nodes at the head of the list with zero use count, that takes them out, stopping at the first in-use node. File reference stashed in a node is dropped when it's taken out. If the above is more or less correct (and I'm pretty sure that it misses quite a few critical points), the rules would be equivalent to + there is a use count associated with the table state. + before we borrow a file reference from the table, we must bump that use count (see the call of __io_req_set_rsrc_node() in io_file_get_fixed()) and arrange for dropping it once we are done with the reference (io_put_rsrc_node() when freeing request, in io_free_batch_list()) + any removals from the table will switch to new state; dropping the removed reference is guaranteed to be delayed until use counts on all earlier states drop to zero. How far are those rules from being accurate and how incomplete they are? I hadn't looked into the quiescence-related stuff, which might or might not be relevant...