[LSF/MM/BPF TOPIC] programmable IO control flow with io_uring and BPF

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Apart from concurrent IO execution, io_uring allows to issue a sequence
of operations, a.k.a links, where requests are executed sequentially one
after another. If an "error" happened, the rest of the link will be
cancelled.

The problem is what to consider an "error". For example, if we
read less bytes than have been asked for, the link will be cancelled.
It's necessary to play safe here, but this implies a lot of overhead if
that isn't the desired behaviour. The user would need to reap all
cancelled requests, analyse the state, resubmit them and suffer from
context switches and all in-kernel preparation work. And there are
dozens of possibly desirable patterns, so it's just not viable to
hard-code them into the kernel.

The other problem is to keep in running even when a request depends on
a result of the previous one. It could be simple passing return code or
something more fancy, like reading from the userspace.

And that's where BPF will be extremely useful. It will control the flow
and do steering.

The concept is to be able run a BPF program after a request's
completion, taking the request's state, and doing some of the following:
1. drop a link/request
2. issue new requests
3. link/unlink requests
4. do fast calculations / accumulate data
5. emit information to the userspace (e.g. via ring's CQ)

With that, it will be possible to have almost context-switch-less IO,
and that's really tempting considering how fast current devices are.

What to discuss:
1. use cases
2. control flow for non-privileged users (e.g. allowing some popular
   pre-registered patterns)
3. what input the program needs (e.g. last request's
   io_uring_cqe) and how to pass it.
4. whether we need notification via CQ for each cancelled/requested
   request, because sometimes they only add noise
5. BPF access to user data (e.g. allow to read only registered buffers)
6. implementation details. E.g.
   - how to ask to run BPF (e.g. with a new opcode)
   - having global BPF, bound to an io_uring instance or mixed
   - program state and how to register
   - rework notion of draining and sequencing
   - live-lock avoidance (e.g. double check io_uring shut-down code)


-- 
Pavel Begunkov

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux