Re: [PATCH v3 06/13] epoll: introduce helpers for adding/removing events to uring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019-05-31 14:56, Peter Zijlstra wrote:
On Fri, May 31, 2019 at 01:15:21PM +0200, Roman Penyaev wrote:
On 2019-05-31 11:56, Peter Zijlstra wrote:
> On Thu, May 16, 2019 at 10:58:03AM +0200, Roman Penyaev wrote:
> > +static inline bool ep_add_event_to_uring(struct epitem *epi,
> > __poll_t pollflags)
> > +{
> > +	struct eventpoll *ep = epi->ep;
> > +	struct epoll_uitem *uitem;
> > +	bool added = false;
> > +
> > +	if (WARN_ON(!pollflags))
> > +		return false;
> > +
> > +	uitem = &ep->user_header->items[epi->bit];
> > +	/*
> > +	 * Can be represented as:
> > +	 *
> > +	 *    was_ready = uitem->ready_events;
> > +	 *    uitem->ready_events &= ~EPOLLREMOVED;
> > +	 *    uitem->ready_events |= pollflags;
> > +	 *    if (!was_ready) {
> > +	 *         // create index entry
> > +	 *    }
> > +	 *
> > +	 * See the big comment inside ep_remove_user_item(), why it is
> > +	 * important to mask EPOLLREMOVED.
> > +	 */
> > +	if (!atomic_or_with_mask(&uitem->ready_events,
> > +				 pollflags, EPOLLREMOVED)) {
> > +		unsigned int i, *item_idx, index_mask;
> > +
> > +		/*
> > +		 * Item was not ready before, thus we have to insert
> > +		 * new index to the ring.
> > +		 */
> > +
> > +		index_mask = ep_max_index_nr(ep) - 1;
> > +		i = __atomic_fetch_add(&ep->user_header->tail, 1,
> > +				       __ATOMIC_ACQUIRE);
> > +		item_idx = &ep->user_index[i & index_mask];
> > +
> > +		/* Signal with a bit, which is > 0 */
> > +		*item_idx = epi->bit + 1;
>
> Did you just increment the user visible tail pointer before you filled
> the data? That is, can the concurrent userspace observe the increment
> before you put credible data in its place?

No, the "data" is the "ready_events" mask, which was updated before,
using cmpxchg, atomic_or_with_mask() call.  All I need is to put an
index of just updated item to the uring.

Userspace, in its turn, gets the index from the ring and then checks
the mask.

But where do you write the index into the shared memory? That index
should be written before you publish the new tail.

The ep_add_event_to_uring() is lockless, thus I can't increase tail after, I need to reserve the index slot, where to write to. I can use shadow tail, which is not seen by userspace, but I have to guarantee that tail is updated with shadow tail *after* all callers of ep_add_event_to_uring() are left. That is possible, please see the code below, but it adds more complexity:

(code was tested on user side, thus has c11 atomics)

static inline void add_event__kernel(struct ring *ring, unsigned bit)
{
        unsigned i, cntr, commit_cntr, *item_idx, tail, old;

        i = __atomic_fetch_add(&ring->cntr, 1, __ATOMIC_ACQUIRE);
        item_idx = &ring->user_itemsindex[i % ring->nr];

        /* Update data */
        *item_idx = bit;

commit_cntr = __atomic_add_fetch(&ring->commit_cntr, 1, __ATOMIC_RELEASE);

        tail = ring->user_header->tail;
        rmb();
        do {
                cntr = ring->cntr;
                if (cntr != commit_cntr)
                        /* Someone else will advance tail */
                        break;

                old = tail;

} while ((tail = __sync_val_compare_and_swap(&ring->user_header->tail, old, cntr)) != old);
}

Another way (current solution) is to spin on userspace side in order to get
index > 0 (valid index is always > 0), i.e.:

	item_idx_ptr = &index[idx & indeces_mask];

	/*
	 * Spin here till we see valid index
	 */
	while (!(idx = __atomic_load_n(item_idx_ptr, __ATOMIC_ACQUIRE)))
		;



So of course tail can be updated after, like you mentioned, but then I have
to introduce locks.  I want to keep it lockless on hot event path.

--
Roman





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux