On 1/24/20 7:18 AM, Pavel Begunkov wrote: > Apart from concurrent IO execution, io_uring allows to issue a sequence > of operations, a.k.a links, where requests are executed sequentially one > after another. If an "error" happened, the rest of the link will be > cancelled. > > The problem is what to consider an "error". For example, if we > read less bytes than have been asked for, the link will be cancelled. > It's necessary to play safe here, but this implies a lot of overhead if > that isn't the desired behaviour. The user would need to reap all > cancelled requests, analyse the state, resubmit them and suffer from > context switches and all in-kernel preparation work. And there are > dozens of possibly desirable patterns, so it's just not viable to > hard-code them into the kernel. > > The other problem is to keep in running even when a request depends on > a result of the previous one. It could be simple passing return code or > something more fancy, like reading from the userspace. > > And that's where BPF will be extremely useful. It will control the flow > and do steering. > > The concept is to be able run a BPF program after a request's > completion, taking the request's state, and doing some of the following: > 1. drop a link/request > 2. issue new requests > 3. link/unlink requests > 4. do fast calculations / accumulate data > 5. emit information to the userspace (e.g. via ring's CQ) > > With that, it will be possible to have almost context-switch-less IO, > and that's really tempting considering how fast current devices are. > > What to discuss: > 1. use cases > 2. control flow for non-privileged users (e.g. allowing some popular > pre-registered patterns) > 3. what input the program needs (e.g. last request's > io_uring_cqe) and how to pass it. > 4. whether we need notification via CQ for each cancelled/requested > request, because sometimes they only add noise > 5. BPF access to user data (e.g. allow to read only registered buffers) > 6. implementation details. E.g. > - how to ask to run BPF (e.g. with a new opcode) > - having global BPF, bound to an io_uring instance or mixed > - program state and how to register > - rework notion of draining and sequencing > - live-lock avoidance (e.g. double check io_uring shut-down code) I think this is a key topic that we should absolutely discuss at LSFMM. -- Jens Axboe