Am 29.01.2015 19:56, schrieb Chris Mason: > On Mon, Jan 19, 2015 at 10:33:19PM +0300, Dan Carpenter wrote: >> Since "count" is an unsigned int, then these conditions are never true: >> >> if (count == ULLONG_MAX) >> events |= POLLERR; >> if (ULLONG_MAX - 1 > count) >> events |= POLLOUT; >> >> It should be a u64, because that's what ctx->count is. Also GCC >> complains that "flags" is unused. >> >> Fixes: a90de8a54127 ('eventfd: don't take the spinlock in eventfd_poll') >> Signed-off-by: Dan Carpenter <dan.carpenter@xxxxxxxxxx> >> >> diff --git a/fs/eventfd.c b/fs/eventfd.c >> index 439e6f0..8d0c0df 100644 >> --- a/fs/eventfd.c >> +++ b/fs/eventfd.c >> @@ -118,8 +118,7 @@ static unsigned int eventfd_poll(struct file *file, poll_table *wait) >> { >> struct eventfd_ctx *ctx = file->private_data; >> unsigned int events = 0; >> - unsigned long flags; >> - unsigned int count; >> + u64 count; >> >> poll_wait(file, &ctx->wqh, wait); >> smp_rmb(); > > Andrew, not sure if you want to take Dan's incremental or a new v2. I > ran this one through the ltp poll and event fd tests. > > From: Chris Mason <clm@xxxxxx> > Date: Wed, 28 Jan 2015 10:15:58 -0800 > Subject: [PATCH v2] eventfd: don't take the spinlock in eventfd_poll > > The spinlock in eventfd_poll is trying to protect the count of events > so it can decide if it should return POLLIN, POLLERR, or POLLOUT. But, > because of the way we drop the lock after calling poll_wait, and drop it > again before returning, we have the same pile of races with the lock as > we do with a single read of ctx->count(). > > This replaces the lock with a read barrier and single read. > > eventfd_write does a single bump of ctx->count, so this should not add > new races with adding events. eventfd_read is similar, it will do a > single decrement with the lock held, and so we're making the race with > concurrent readers slightly larger. > > This spinlock is the top CPU user in kernel code during one of our workloads. > Removing it gives us a ~2% boost. > > Signed-off-by: Chris Mason <clm@xxxxxx> > Fixed-by: Dan Carpenter <dan.carpenter@xxxxxxxxxx> > --- > fs/eventfd.c | 12 ++++++------ > 1 file changed, 6 insertions(+), 6 deletions(-) > > v1->v2 use a u64 for count and get rid of unused flags (Dan Carpenter) > > diff --git a/fs/eventfd.c b/fs/eventfd.c > index 4b0a226..8d0c0df 100644 > --- a/fs/eventfd.c > +++ b/fs/eventfd.c > @@ -118,18 +118,18 @@ static unsigned int eventfd_poll(struct file *file, poll_table *wait) > { > struct eventfd_ctx *ctx = file->private_data; > unsigned int events = 0; > - unsigned long flags; > + u64 count; > > poll_wait(file, &ctx->wqh, wait); > + smp_rmb(); > + count = ctx->count; > > - spin_lock_irqsave(&ctx->wqh.lock, flags); > - if (ctx->count > 0) > + if (count > 0) > events |= POLLIN; > - if (ctx->count == ULLONG_MAX) > + if (count == ULLONG_MAX) > events |= POLLERR; > - if (ULLONG_MAX - 1 > ctx->count) > + if (ULLONG_MAX - 1 > count) > events |= POLLOUT; > - spin_unlock_irqrestore(&ctx->wqh.lock, flags); > > return events; > } Hi Chris, i was thinking about the code here: if (count > 0) events |= POLLIN; if (count == ULLONG_MAX) events |= POLLERR; if (count < ULLONG_MAX) events |= POLLOUT; My understanding is that it would set POLLIN|POLLOUT in most cases. Is that intended ? re, wh -- To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html