Re: [PATCH 5/9] replace do_setxattr() with saner helpers.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/2/24 3:19 PM, Al Viro wrote:
> On Wed, Oct 02, 2024 at 12:00:45PM -0600, Jens Axboe wrote:
>> On 10/1/24 8:08 PM, Al Viro wrote:
>>> On Tue, Oct 01, 2024 at 07:34:12PM -0600, Jens Axboe wrote:
>>>
>>>>> -retry:
>>>>> -	ret = filename_lookup(AT_FDCWD, ix->filename, lookup_flags, &path, NULL);
>>>>> -	if (!ret) {
>>>>> -		ret = __io_setxattr(req, issue_flags, &path);
>>>>> -		path_put(&path);
>>>>> -		if (retry_estale(ret, lookup_flags)) {
>>>>> -			lookup_flags |= LOOKUP_REVAL;
>>>>> -			goto retry;
>>>>> -		}
>>>>> -	}
>>>>> -
>>>>> +	ret = filename_setxattr(AT_FDCWD, ix->filename, LOOKUP_FOLLOW, &ix->ctx);
>>>>>  	io_xattr_finish(req, ret);
>>>>>  	return IOU_OK;
>>>>
>>>> this looks like it needs an ix->filename = NULL, as
>>>> filename_{s,g}xattr() drops the reference. The previous internal helper
>>>> did not, and hence the cleanup always did it. But should work fine if
>>>> ->filename is just zeroed.
>>>>
>>>> Otherwise looks good. I've skimmed the other patches and didn't see
>>>> anything odd, I'll take a closer look tomorrow.
>>>
>>> Hmm...  I wonder if we would be better off with file{,name}_setxattr()
>>> doing kvfree(cxt->kvalue) - it makes things easier both on the syscall
>>> and on io_uring side.
>>>
>>> I've added minimal fixes (zeroing ix->filename after filename_[sg]etxattr())
>>> to 5/9 and 6/9 *and* added a followup calling conventions change at the end
>>> of the branch.  See #work.xattr2 in the same tree; FWIW, the followup
>>> cleanup is below; note that putname(ERR_PTR(-Ewhatever)) is an explicit
>>> no-op, so there's no need to zero on getname() failures.
>>
>> Looks good to me, thanks Al!
> 
> I'm still not sure if the calling conventions change is right - in the
> current form the last commit in there leaks ctx.kvalue in -EBADF case.
> It's easy to fix up, but... as far as I'm concerned, a large part of
> the point of the exercise is to come up with the right model for the
> calling conventions for that family of APIs.

The reason I liked the putname() is that it's unconditional - the caller
can rely on it being put, regardless of the return value. So I'd say the
same should be true for ctx.kvalue, and if not, the caller should still
free it. That's the path of least surprise - no leak for the least
tested error path, and no UAF in the success case.

For the put case, most other abstractions end up being something ala:

helper(struct file *file, ...)
{
	actual actions
}

regular_sys_call(int fd, ...)
{
	struct fd f;
	int ret = -EBADF;

	f = fdget(fd);
	if (f.file) {
		ret = helper(f.file, ...);
		fdput(f();
	}

	return ret;
}

where io_uring will use helper(), and where the file reference is
assumed to be valid for helper() and helper() will not put a reference
to it.

That's a bit different than your putname() case, but I think as long as
it's consistent regardless of return value, then either approach is
fine. Maybe just add a comment about that? At least for the consistent
case, if it blows up, it'll blow up instantly rather than be a surprise
down the line for "case x,y,z doesn't put it" or "case x,y,z always puts
in, normal one does not".

> I really want to get rid of that ad-hoc crap.  If we are to have what
> amounts to the alternative syscall interface, we'd better get it
> right.  I'm perfectly fine with having a set of "this is what the
> syscall is doing past marshalling arguments" primitives, but let's
> make sure they are properly documented and do not have landmines for
> callers to step into...

Fully agree.

> Questions on the io_uring side:
> 	* you usually reject REQ_F_FIXED_FILE for ...at() at ->prep() time.
> Fine, but... what's the point of doing that in IORING_OP_FGETXATTR case?
> Or IORING_OP_GETXATTR, for that matter, since you pass AT_FDCWD anyway...
> Am I missing something subtle here?

Right, it could be allowed for fgetxattr on the io_uring side. Anything
that passes in a struct file would be fair game to enable it on.
Anything that passes in a path (eg a non-fd value), it obviously
wouldn't make sense anyway.

> 	* what's to guarantee that pointers fetched by io_file_get_fixed()
> called from io_assing_file() will stay valid?  You do not bump the struct
> file refcount in this case, after all; what's to prevent unregistration
> from the main thread while the worker is getting through your request?
> Is that what the break on node->refs in the loop in io_rsrc_node_ref_zero()
> is about?  Or am I barking at the wrong tree here?  I realize that I'm about
> the last person to complain about the lack of documentation, but...
> 
> 	FWIW, my impression is that you have a list of nodes corresponding
> to overall resource states (which includes the file reference table) and
> have each borrow bump the use count on the node corresponding to the current
> state (at the tail of the list?)
> 	Each removal adds new node to the tail of the list, sticks the
> file reference there and tries to trigger io_rsrc_node_ref_zero() (which,
> for some reason, takes node instead of the node->ctx, even though it
> doesn't give a rat's arse about anything else in its argument).
> 	If there are nodes at the head of the list with zero use count,
> that takes them out, stopping at the first in-use node.  File reference
> stashed in a node is dropped when it's taken out.
> 
> 	If the above is more or less correct (and I'm pretty sure that it
> misses quite a few critical points), the rules would be equivalent to
> 	+ there is a use count associated with the table state.
> 	+ before we borrow a file reference from the table, we must bump
> that use count (see the call of __io_req_set_rsrc_node() in
> io_file_get_fixed()) and arrange for dropping it once we are done with
> the reference (io_put_rsrc_node() when freeing request, in io_free_batch_list())
> 	+ any removals from the table will switch to new state; dropping
> the removed reference is guaranteed to be delayed until use counts on
> all earlier states drop to zero.
> 
> 	How far are those rules from being accurate and how incomplete
> they are?  I hadn't looked into the quiescence-related stuff, which might
> or might not be relevant...

That is pretty darn accurate. The ordering of the rsrc nodes and the
break ensure that it stays valid until anything using it has completed.
And yes it would be nice to document that code a bit, but honestly I'd
much rather just make it more obviously referenced if that can be done
cheaply enough. For now, I'll add some comments, and hope you do the
same on your side! Because I don't ever remember seeing an Al comment.
Great emails, for sure, but not really comments.

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux