Apart from concurrent IO execution, io_uring allows to issue a sequence of operations, a.k.a links, where requests are executed sequentially one after another. If an "error" happened, the rest of the link will be cancelled. The problem is what to consider an "error". For example, if we read less bytes than have been asked for, the link will be cancelled. It's necessary to play safe here, but this implies a lot of overhead if that isn't the desired behaviour. The user would need to reap all cancelled requests, analyse the state, resubmit them and suffer from context switches and all in-kernel preparation work. And there are dozens of possibly desirable patterns, so it's just not viable to hard-code them into the kernel. The other problem is to keep in running even when a request depends on a result of the previous one. It could be simple passing return code or something more fancy, like reading from the userspace. And that's where BPF will be extremely useful. It will control the flow and do steering. The concept is to be able run a BPF program after a request's completion, taking the request's state, and doing some of the following: 1. drop a link/request 2. issue new requests 3. link/unlink requests 4. do fast calculations / accumulate data 5. emit information to the userspace (e.g. via ring's CQ) With that, it will be possible to have almost context-switch-less IO, and that's really tempting considering how fast current devices are. What to discuss: 1. use cases 2. control flow for non-privileged users (e.g. allowing some popular pre-registered patterns) 3. what input the program needs (e.g. last request's io_uring_cqe) and how to pass it. 4. whether we need notification via CQ for each cancelled/requested request, because sometimes they only add noise 5. BPF access to user data (e.g. allow to read only registered buffers) 6. implementation details. E.g. - how to ask to run BPF (e.g. with a new opcode) - having global BPF, bound to an io_uring instance or mixed - program state and how to register - rework notion of draining and sequencing - live-lock avoidance (e.g. double check io_uring shut-down code) -- Pavel Begunkov
Attachment:
signature.asc
Description: OpenPGP digital signature