On 2019-05-31 14:56, Peter Zijlstra wrote:
On Fri, May 31, 2019 at 01:15:21PM +0200, Roman Penyaev wrote:
On 2019-05-31 11:56, Peter Zijlstra wrote:
> On Thu, May 16, 2019 at 10:58:03AM +0200, Roman Penyaev wrote:
> > +static inline bool ep_add_event_to_uring(struct epitem *epi,
> > __poll_t pollflags)
> > +{
> > + struct eventpoll *ep = epi->ep;
> > + struct epoll_uitem *uitem;
> > + bool added = false;
> > +
> > + if (WARN_ON(!pollflags))
> > + return false;
> > +
> > + uitem = &ep->user_header->items[epi->bit];
> > + /*
> > + * Can be represented as:
> > + *
> > + * was_ready = uitem->ready_events;
> > + * uitem->ready_events &= ~EPOLLREMOVED;
> > + * uitem->ready_events |= pollflags;
> > + * if (!was_ready) {
> > + * // create index entry
> > + * }
> > + *
> > + * See the big comment inside ep_remove_user_item(), why it is
> > + * important to mask EPOLLREMOVED.
> > + */
> > + if (!atomic_or_with_mask(&uitem->ready_events,
> > + pollflags, EPOLLREMOVED)) {
> > + unsigned int i, *item_idx, index_mask;
> > +
> > + /*
> > + * Item was not ready before, thus we have to insert
> > + * new index to the ring.
> > + */
> > +
> > + index_mask = ep_max_index_nr(ep) - 1;
> > + i = __atomic_fetch_add(&ep->user_header->tail, 1,
> > + __ATOMIC_ACQUIRE);
> > + item_idx = &ep->user_index[i & index_mask];
> > +
> > + /* Signal with a bit, which is > 0 */
> > + *item_idx = epi->bit + 1;
>
> Did you just increment the user visible tail pointer before you filled
> the data? That is, can the concurrent userspace observe the increment
> before you put credible data in its place?
No, the "data" is the "ready_events" mask, which was updated before,
using cmpxchg, atomic_or_with_mask() call. All I need is to put an
index of just updated item to the uring.
Userspace, in its turn, gets the index from the ring and then checks
the mask.
But where do you write the index into the shared memory? That index
should be written before you publish the new tail.
The ep_add_event_to_uring() is lockless, thus I can't increase tail
after,
I need to reserve the index slot, where to write to. I can use shadow
tail,
which is not seen by userspace, but I have to guarantee that tail is
updated
with shadow tail *after* all callers of ep_add_event_to_uring() are
left.
That is possible, please see the code below, but it adds more
complexity:
(code was tested on user side, thus has c11 atomics)
static inline void add_event__kernel(struct ring *ring, unsigned bit)
{
unsigned i, cntr, commit_cntr, *item_idx, tail, old;
i = __atomic_fetch_add(&ring->cntr, 1, __ATOMIC_ACQUIRE);
item_idx = &ring->user_itemsindex[i % ring->nr];
/* Update data */
*item_idx = bit;
commit_cntr = __atomic_add_fetch(&ring->commit_cntr, 1,
__ATOMIC_RELEASE);
tail = ring->user_header->tail;
rmb();
do {
cntr = ring->cntr;
if (cntr != commit_cntr)
/* Someone else will advance tail */
break;
old = tail;
} while ((tail =
__sync_val_compare_and_swap(&ring->user_header->tail, old, cntr)) !=
old);
}
Another way (current solution) is to spin on userspace side in order to
get
index > 0 (valid index is always > 0), i.e.:
item_idx_ptr = &index[idx & indeces_mask];
/*
* Spin here till we see valid index
*/
while (!(idx = __atomic_load_n(item_idx_ptr, __ATOMIC_ACQUIRE)))
;
So of course tail can be updated after, like you mentioned, but then I
have
to introduce locks. I want to keep it lockless on hot event path.
--
Roman