Re: [RFC v3 1/2] epoll: avoid spinlock contention with wfcqueue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 22, 2013 at 12:24 PM, Eric Wong <normalperson@xxxxxxxx> wrote:
...
> Perhaps just using epitem->ws and removing ep->ws can work.
>
> I think the following change to keep wakeup_source in sync with
> epi->state is sufficient to prevent suspend.
>
> But I'm not familiar with suspend.  Is it possible to suspend while
> a) spinning on a lock?
> b) holding a spinlock?
>

If you code cannot be preempted suspend will not complete. However, if
you drop the last wakeup_source that was active, you may trigger a new
suspend attempt, so it is best to avoid this.

> Since we avoid spinlocks in the main ep_poll_callback path, maybe the
> chance of entering suspend is reduced anyways since we may activate
> the ws sooner.
>
> What do you think?
>

I think you should make sure ep_poll_callback does not return without
an active wakeup_source if EPOLLWAKEUP is set.

> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 1e04175..531ad46 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -214,9 +214,6 @@ struct eventpoll {
>         /* RB tree root used to store monitored fd structs */
>         struct rb_root rbr;
>
> -       /* wakeup_source used when ep_send_events is running */
> -       struct wakeup_source *ws;
> -
>         /* The user that created the eventpoll descriptor */
>         struct user_struct *user;
>
> @@ -718,7 +715,6 @@ static void ep_free(struct eventpoll *ep)
>         mutex_unlock(&epmutex);
>         mutex_destroy(&ep->mtx);
>         free_uid(ep->user);
> -       wakeup_source_unregister(ep->ws);
>         kfree(ep);
>  }
>
> @@ -1137,12 +1133,6 @@ static int ep_create_wakeup_source(struct epitem *epi)
>         const char *name;
>         struct wakeup_source *ws;
>
> -       if (!epi->ep->ws) {
> -               epi->ep->ws = wakeup_source_register("eventpoll");
> -               if (!epi->ep->ws)
> -                       return -ENOMEM;
> -       }
> -
>         name = epi->ffd.file->f_path.dentry->d_name.name;
>         ws = wakeup_source_register(name);
>
> @@ -1390,22 +1380,6 @@ static int ep_send_events(struct eventpoll *ep, bool *eavail,
>                 WARN_ON(state != EP_STATE_READY);
>                 wfcq_node_init(&epi->rdllink);
>
> -               /*
> -                * Activate ep->ws before deactivating epi->ws to prevent
> -                * triggering auto-suspend here (in case we reactive epi->ws
> -                * below).
> -                *
> -                * This could be rearranged to delay the deactivation of epi->ws
> -                * instead, but then epi->ws would temporarily be out of sync
> -                * with epi->state.
> -                */
> -               ws = ep_wakeup_source(epi);
> -               if (ws) {
> -                       if (ws->active)
> -                               __pm_stay_awake(ep->ws);
> -                       __pm_relax(ws);
> -               }
> -
>                 revents = ep_item_poll(epi, &pt);
>
>                 /*
> @@ -1419,7 +1393,6 @@ static int ep_send_events(struct eventpoll *ep, bool *eavail,
>                             __put_user(epi->event.data, &uevent->data)) {
>                                 wfcq_enqueue_local(&ep->txlhead, &ep->txltail,
>                                                         &epi->rdllink);
> -                               ep_pm_stay_awake(epi);
>                                 if (!eventcnt)
>                                         eventcnt = -EFAULT;
>                                 break;
> @@ -1441,13 +1414,34 @@ static int ep_send_events(struct eventpoll *ep, bool *eavail,
>                                  */
>                                 wfcq_enqueue_local(&lthead, &lttail,
>                                                         &epi->rdllink);
> -                               ep_pm_stay_awake(epi);
>                                 continue;
>                         }
>                 }
>
>                 /*
> -                * reset item state for EPOLLONESHOT and EPOLLET
> +                * Deactivate the wakeup source before marking it idle.
> +                * The barrier implied by the spinlock in __pm_relax ensures
> +                * any ep_poll_callback callers running will see the
> +                * deactivated ws before epi->state == EP_STATE_IDLE.
> +                *
> +                * For EPOLLET, the event may still be merged into the one
> +                * that is currently on its way into userspace, but it has
> +                * always been the responsibility of userspace to trigger
> +                * EAGAIN on the file before it expects the item to appear
> +                * again in epoll_wait.
> +                *
> +                * Level Trigger never gets here, so the ws remains active.

I don't think this statement is true.

> +                *
> +                * EPOLLONESHOT will either be dropped by ep_poll_callback
> +                * or dropped the next time ep_send_events is called, so the
> +                * ws is irrelevant until it is hit by ep_modify
> +                */
> +               ws = ep_wakeup_source(epi);
> +               if (ws)
> +                       __pm_relax(ws);
> +
> +               /*
> +                * reset item state for EPOLLONESHOT and EPOLLET.
>                  * no barrier here, rely on ep->mtx release for write barrier
>                  */
>                 epi->state = EP_STATE_IDLE;
>
> --
> Eric Wong



-- 
Arve Hjønnevåg
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux