On 15/07/2020 20:11, Matthew Wilcox wrote: > On Wed, Jul 15, 2020 at 07:35:50AM -0700, Andy Lutomirski wrote: >>> On Jul 15, 2020, at 4:12 AM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote: >>> >>> <feff>Hi, > > feff? Are we doing WTF-16 in email now? ;-) > >>> >>> This thread is to discuss the possibility of stracing requests >>> submitted through io_uring. I'm not directly involved in io_uring >>> development, so I'm posting this out of interest in using strace on >>> processes utilizing io_uring. >>> >>> io_uring gives the developer a way to bypass the syscall interface, >>> which results in loss of information when tracing. This is a strace >>> fragment on "io_uring-cp" from liburing: >>> >>> io_uring_enter(5, 40, 0, 0, NULL, 8) = 40 >>> io_uring_enter(5, 1, 0, 0, NULL, 8) = 1 >>> io_uring_enter(5, 1, 0, 0, NULL, 8) = 1 >>> ... >>> >>> What really happens are read + write requests. Without that >>> information the strace output is mostly useless. >>> >>> This loss of information is not new, e.g. calls through the vdso or >>> futext fast paths are also invisible to strace. But losing filesystem >>> I/O calls are a major blow, imo. To clear details for those who are not familiar with io_uring: io_uring has a pair of queues, submission (SQ) and completion queues (CQ), both shared between kernel and user spaces. The userspace submits requests by filling a chunk of memory in SQ. The kernel picks up SQ entries in (syscall io_uring_enter) or asynchronously by polling SQ. CQ entries are filled by the kernel completely asynchronously and in parallel. Some users just poll CQ to get them, but also have a way to wait for them. >>> >>> What do people think? >>> >>> From what I can tell, listing the submitted requests on >>> io_uring_enter() would not be hard. Request completion is >>> asynchronous, however, and may not require io_uring_enter() syscall. >>> Am I correct? Both, submission and completion sides may not require a syscall. >>> >>> Is there some existing tracing infrastructure that strace could use to >>> get async completion events? Should we be introducing one? There are static trace points covering all needs. And if not used the whole thing have to be zero-overhead. Otherwise there is perf, which is zero-overhead, and this IMHO won't fly. >> >> Let’s add some seccomp folks. We probably also want to be able to run >> seccomp-like filters on io_uring requests. So maybe io_uring should >> call into seccomp-and-tracing code for each action. > > Adding Stefano since he had a complementary proposal for iouring > restrictions that weren't exactly seccomp. > -- Pavel Begunkov