On Sat, Jul 30, 2011 at 2:26 PM, Nelson Elhage <nelhage@xxxxxxxxxxx> wrote: > Oof, this is kinda ugly. > > I *believe* that as of > > 22bacca4 epoll: prevent creating circular epoll structures > > the epoll locking is correct, with the rule that two ep->mtx's can be > locked recursively iff ep1 contains ep2 (possibly indirectly), and > that if ep1 contains ep2, ep1 will always be locked first. Since > 22bacca4 eliminated the possibility of epoll cycles, this means there > is a well-defined lock order. > > I *think* that for any static configuration of epoll file descriptors, > we can fix the problem by doing something like using the "call_nests" > parameter passed by ep_call_nested as the lock subkey, but I haven't > thought this through completely. > > However, since that lock order is subject to change, and even > reversal, at runtime, I think the following (pathological) sequence of > userspace calls will trigger lockdep warnings, even though there is > never any risk of deadlock: Thinking about this more, I think that the "call_nests" approach won't have this problem, since that lets the locks change subclasses exactly as necessary. Unless someone beats me to it, I'll see if I can put such a patch together. - Nelson > > - Create epoll fds ep1 and ep2 > - Add ep1 to ep2 > - Do some operations that result in recursive locking > - Remove ep1 from ep2 > - Add ep2 to ep1 > - Do some operations that result in recursive locking > > In fact, that program should trigger warnings even if we did the > pathological thing of using the address of the 'struct eventpoll' as > the subclass [1], since it is *literally the same two locks* that are > getting acquired in different orders at different times. > > I also don't see a way to simplify the epoll locking without adding > more restrictions to how the API can be used. As far as I can tell, > the situation really is just that nasty. > > - Nelson > > [1] Never mind that the "subclass" is an unsigned int, so we can't > even do that directly on 64-bit systems. > > On Fri, Jul 29, 2011 at 08:50:55PM +0200, Paul Bolle wrote: >> (Sent to the addresses get_maintainer.pl suggested and to Davide and >> Nelson, because this is about code they cared about half a year ago. >> CC'ed to the addresses involved until now.) >> >> On Thu, 2011-07-21 at 13:55 +0200, Paul Bolle wrote: >> > That number turned out to be 722472 >> > ( https://bugzilla.redhat.com/show_bug.cgi?id=722472 ). >> >> 0) This seems to be a lockdep false alarm. The cause is an epoll >> instance added to another epoll instance (ie, nesting epoll instances). >> Apparently lockdep isn't supplied enough information to determine what's >> going on here. Now there might be a number of ways to fix this. But >> after having looked at this for quite some time and updating the above >> bug report a number of times, I guessed that involving people outside >> those tracking that report might move things forward towards a solution. >> At least, I wasn't able to find a, well, clean solution. >> >> 1) The call chain triggering the warning with the nice >> *** DEADLOCK *** >> >> line can be summarized like this: >> >> sys_epoll_ctl >> mutex_lock epmutex >> ep_call_nested >> ep_loop_check_proc >> mutex_lock ep->mtx >> mutex_unlock ep->mtx >> mutex_lock ep->mtx >> ep_eventpoll_poll >> ep_ptable_queue_proc >> ep_call_nested >> ep_poll_readyevents_pro >> ep_scan_ready_list >> mutex_lock ep->mtx >> ep_read_events_proc >> mutex_unlock ep->mtx >> mutex_unlock ep->mtx >> mutex_unlock epmutex >> >> 2) When ep_scan_ready_list() calls mutex_lock(), lockdep notices >> recursive locking on ep->mtx. It is not supplied enough information to >> determine that the lock is related to two separate epoll instances (the >> outer instance and the nested instance). The solution appears to involve >> supplying lockdep that information (ie, "lockdep annotation"). >> >> 3) Please see the bugzilla.redhat.com report for further background. >> >> >> Paul Bolle >> > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html