Re: [PATCH v3 00/13] epoll: support pollable epoll from userspace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019-05-31 16:48, Jens Axboe wrote:
On 5/16/19 2:57 AM, Roman Penyaev wrote:
Hi all,

This is v3 which introduces pollable epoll from userspace.

v3:
  - Measurements made, represented below.

  - Fix alignment for epoll_uitem structure on all 64-bit archs except
    x86-64. epoll_uitem should be always 16 bit, proper BUILD_BUG_ON
    is added. (Linus)

- Check pollflags explicitly on 0 inside work callback, and do nothing
    if 0.

v2:
- No reallocations, the max number of items (thus size of the user ring)
    is specified by the caller.

- Interface is simplified: -ENOSPC is returned on attempt to add a new
    epoll item if number is reached the max, nothing more.

  - Alloced pages are accounted using user->locked_vm and limited to
    RLIMIT_MEMLOCK value.

  - EPOLLONESHOT is handled.

This series introduces pollable epoll from userspace, i.e. user creates epfd with a new EPOLL_USERPOLL flag, mmaps epoll descriptor, gets header
and ring pointers and then consumes ready events from a ring, avoiding
epoll_wait() call.  When ring is empty, user has to call epoll_wait()
in order to wait for new events.  epoll_wait() returns -ESTALE if user
ring has events in the ring (kind of indication, that user has to consume events from the user ring first, I could not invent anything better than
returning -ESTALE).

For user header and user ring allocation I used vmalloc_user(). I found
that it is much easy to reuse remap_vmalloc_range_partial() instead of
dealing with page cache (like aio.c does).  What is also nice is that
virtual address is properly aligned on SHMLBA, thus there should not be
any d-cache aliasing problems on archs with vivt or vipt caches.

Why aren't we just adding support to io_uring for this instead? Then we
don't need yet another entirely new ring, that's is just a little
different from what we have.

I haven't looked into the details of your implementation, just curious
if there's anything that makes using io_uring a non-starter for this
purpose?

Afaict the main difference is that you do not need to recharge an fd
(submit new poll request in terms of io_uring): once fd has been added to epoll with epoll_ctl() - we get events. When you have thousands of fds -
that should matter.

Also interesting question is how difficult to modify existing event loops
in event libraries in order to support recharging (EPOLLONESHOT in terms
of epoll).

Maybe Azat who maintains libevent can shed light on this (currently I see
that libevent does not support "EPOLLONESHOT" logic).


--
Roman





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux