Hi Oleg,
On 12/28/24 5:32 PM, Oleg Nesterov wrote:
On 12/28, Oleg Nesterov wrote:
If nothing else, consider
int CONDITION;
wait_queue_head_t WQ;
void wake(void)
{
CONDITION = 1;
wake_up(WQ);
}
void wait(void)
{
DEFINE_WAIT_FUNC(entry, woken_wake_function);
add_wait_queue(WQ, entry);
if (!CONDITION)
wait_woken(entry, ...);
remove_wait_queue(WQ, entry);
}
this code is correct even if LOAD(CONDITION) can leak into the critical
section in add_wait_queue(), so CPU running wait() can actually do
// add_wait_queue
spin_lock(WQ->lock);
LOAD(CONDITION); // false!
list_add(entry, head);
spin_unlock(WQ->lock);
if (!false) // result of the LOAD above
wait_woken(entry, ...);
Now suppose that another CPU executes wake() between LOAD(CONDITION)
and list_add(entry, head). With your patch wait() will miss the event.
The same for __pollwait(), I think...
No?
Even simpler,
void wait(void)
{
DEFINE_WAIT(entry);
__set_current_state(XXX);
add_wait_queue(WQ, entry);
if (!CONDITION)
schedule();
remove_wait_queue(WQ, entry);
__set_current_state(TASK_RUNNING);
}
This code is ugly but currently correct unless I am totally confused.
It is a chance of the add_wait_queue() path, thus impact on all calls.
With (busybox) "find /sys /proc | grep aaaabbbccc", I've seen 16385
wakeup calls with empty queue, and just 6 with an entry in the queue.
But on other workloads, the ratio was more something like 75%
empty/25%with entries.
I.e.: We would have long discussions if the change only helps for some
usecases, and might have negative impact on other use cases.
And: Your proposal is in conflict with
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/kernel/fork.c?h=v2.6.0&id=e220fdf7a39b54a758f4102bdd9d0d5706aa32a7
But I do not see the issue, the worst possible scenario should be something like:
// add_wait_queue
spin_lock(WQ->lock);
LOAD(CONDITION); // false!
list_add(entry, head);
STORE(current_state)
spin_unlock(WQ->lock);
--
Manfred