On 09/22/2013 04:41 PM, Eric Wong wrote: > Jason Baron <jbaron@xxxxxxxxxx> wrote: >> epoll: reduce usage of global 'epmutex' lock >> >> Epoll file descriptors that are 1 link from a wakeup source and >> are not nested within other epoll descriptors, or pointing to >> other epoll descriptors, don't need to check for loop creation or >> the creation of wakeup storms. Because of this we can avoid taking >> the global 'epmutex' in these cases. This state for the epoll file >> descriptor is marked as 'EVENTPOLL_BASIC'. Once the epoll file >> descriptor is attached to another epoll file descriptor it is >> labeled as 'EVENTPOLL_COMPLEX', and full loop checking and wakeup >> storm creation are checked using the the global 'epmutex'. It does >> not transition back. Hopefully, this is a common usecase... > > Cool. I was thinking about doing the same thing down the line (for > EPOLL_CTL_ADD, too) > >> @@ -166,6 +167,14 @@ struct epitem { >> >> /* The structure that describe the interested events and the source fd */ >> struct epoll_event event; >> + >> + /* TODO: really necessary? */ >> + int on_list; > > There's some things we can overload to avoid increasing epitem size > (.ep, .ffd.fd, ...), so on_list should be unnecessary. Even with 'on_list' the size of 'epitem' stayed at 128 bytes. Not sure if there are certain compile options though that can move it over that you are concerned about...so I think that change is ok. The biggest hack here was using 'struct rb_node' instead of a proper 'struct rcu_head', so as not to increase the size of epitem. I think this is safe and I've added build time checks to ensure that 'struct rb_node' is never smaller than 'struct rcu_head'. But its rather hacky. I will probably break this change out separately when I re-post so it can be reviewed independently... Thanks, -Jason -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html