On 5/16/19 2:57 AM, Roman Penyaev wrote: > Hi all, > > This is v3 which introduces pollable epoll from userspace. > > v3: > - Measurements made, represented below. > > - Fix alignment for epoll_uitem structure on all 64-bit archs except > x86-64. epoll_uitem should be always 16 bit, proper BUILD_BUG_ON > is added. (Linus) > > - Check pollflags explicitly on 0 inside work callback, and do nothing > if 0. > > v2: > - No reallocations, the max number of items (thus size of the user ring) > is specified by the caller. > > - Interface is simplified: -ENOSPC is returned on attempt to add a new > epoll item if number is reached the max, nothing more. > > - Alloced pages are accounted using user->locked_vm and limited to > RLIMIT_MEMLOCK value. > > - EPOLLONESHOT is handled. > > This series introduces pollable epoll from userspace, i.e. user creates > epfd with a new EPOLL_USERPOLL flag, mmaps epoll descriptor, gets header > and ring pointers and then consumes ready events from a ring, avoiding > epoll_wait() call. When ring is empty, user has to call epoll_wait() > in order to wait for new events. epoll_wait() returns -ESTALE if user > ring has events in the ring (kind of indication, that user has to consume > events from the user ring first, I could not invent anything better than > returning -ESTALE). > > For user header and user ring allocation I used vmalloc_user(). I found > that it is much easy to reuse remap_vmalloc_range_partial() instead of > dealing with page cache (like aio.c does). What is also nice is that > virtual address is properly aligned on SHMLBA, thus there should not be > any d-cache aliasing problems on archs with vivt or vipt caches. Why aren't we just adding support to io_uring for this instead? Then we don't need yet another entirely new ring, that's is just a little different from what we have. I haven't looked into the details of your implementation, just curious if there's anything that makes using io_uring a non-starter for this purpose? -- Jens Axboe