On 12/30/21 10:01 AM, Christian Brauner wrote: >> However, what I really want to see is the answer to my question re control >> flow and the place where we do copy the arguments from userland. Including >> the pathname. >> >> *IF* there's a subtle reason that has to be done from prep phase (and there >> might very well be - figuring out the control flow in io_uring is bloody >> painful), I would really like to see it spelled out, along with the explanation >> of the reasons why statx() doesn't need anything of that sort. >> >> If there's no such reasons, I would bloody well leave marshalling to the > > That's really something the io_uring folks should explain to us. I can't > help much there either apart from what I can gather from looking through > the io_req_prep() switch. > > Where it's clear that nearly all syscall-operations immediately do a > getname() and/or copy their arguments in the *_prep() phase as, not in > the actual "do-the-work" phase. For example, io_epoll_ctl_prep() which > copies struct epoll_event via copy_from_user(). It doesn't do it in > io_epoll_ctl(). So as such io_statx_prep() is the outlier... For each command, there are two steps: - The prep of it, this happens inline from the system call where the request, or requests, are submitted. The prep phase should ensure that argument structs are stable. Hence a caller can prep a request and have memory on stack, as long as it submits before it becomes invalid. An example of that are iovecs for readv/writev. The caller does not need to have them stable for the duration of the request, just across submit. That's the io_${cmd}_prep() helpers. - The execution of it. May be separate from prep and from an async worker. Where the lower layers don't support a nonblocking attempt, they are always done async. The statx stuff is an example of that. Hence prep needs to copy from userland on the prep side always for the statx family, as execution will happen out-of-line from the submission. Does that explain it? -- Jens Axboe