Hi, I'm iterating again on this topic, this time with the author of the patch Cc'ed. The following commit: a218cc491420 (epoll: use rwlock in order to reduce ep_poll callback() contention) has changed the ep->lock into an rwlock. This can cause priority inversion on PREEMPT_RT. Here is an example: 1) High priority task A waits for events on epoll_wait(), nothing shows up so it goes to sleep for new events in the ep_poll() loop. 2) Lower prio task B brings new events in ep_poll_callback(), waking up A while still holding read_lock(ep->lock) 3) Task A wakes up immediately, tries to grab write_lock(ep->lock) but it has to wait for task B to release read_lock(ep->lock). Unfortunately there is no priority inheritance when write_lock() is called on an rwlock that is already read_lock'ed. So back to task B that may even be preempted by yet another task before releasing read_lock(ep->lock). Now how to solve this? Several possibilities: == Delay the wake up after releasing the read_lock()? == That solves part of the problem only. If another event comes up concurrently we are back to the original issue. == Make rwlock more fair ? == Currently read_lock() only acquires the rtmutex if the lock is already write-held (or write_lock() is waiting to acquire). So if read_lock() happens after write_lock(), fairness is observed but if write_lock() happens after read_lock(), priority inheritance doesn't happen. I think there has been attempts to solve this by the past but some issues arised (don't know the exact details, comments on rwbase_rt.c bring some clues). == Convert the rwlock to RCU ? == Traditionally, we try to convert rwlocks bringing issues to RCU. I'm not sure the situation fits here because the rwlock is used the other way around: the epoll consumer does the write_lock() and the producers do read_lock(). Then concurrent producers use ad-hoc concurrent list add (see list_add_tail_lockless) to handle racy modifications. There are also list modifications on both side. There are added from the producers and read and deleted (even re-added sometimes) on the consumer side. Perhaps RCU could be used with keeping locking on the consumer side... == Convert to llist ? == It's a possibility but some operations like single element deletion may be costly because only llist_add() and llist_del_all() are atomic on llist. !CONFIG_PREEMPT_RT might not be happy about it. == Consider epoll not PREEMPT_RT friendly? == A last resort is to simply consider epoll is not RT-friendly and suggest using more simple alternatives like poll().... Any thoughts?