On 22.12.2014 17:25, Vladimir Davydov wrote:
E.g. suppose processes are
governed by FIFO and kswapd happens to have a higher prio than the
process killed by OOM. Then after cond_resched kswapd will be picked for
execution again, and the killing process won't have a chance to remove
itself from the wait queue.
Except that kswapd runs as SCHED_NORMAL with 0 priority.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 744e2b491527..2a123634c220 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2984,6 +2984,9 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
if (remaining)
return false;
+ if (!pgdat_balanced(pgdat, order, classzone_idx))
+ return false;
+
What would be consequences of not waking up pfmemalloc waiters while the
node is not balanced?
They will get woken up a bit later in balanced_pgdat. This might result
in latency spikes though. In order not to change the original behaviour
we could always wake all pfmemalloc waiters no matter if we are going to
sleep or not:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 744e2b491527..a21e0bd563c3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2993,10 +2993,7 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
* so wake them now if necessary. If necessary, processes will wake
* kswapd and get throttled again
*/
- if (waitqueue_active(&pgdat->pfmemalloc_wait)) {
- wake_up(&pgdat->pfmemalloc_wait);
- return false;
- }
+ wake_up_all(&pgdat->pfmemalloc_wait);
return pgdat_balanced(pgdat, order, classzone_idx);
So you are relying on scheduling points somewhere down the
balance_pgdat. That should be sufficient. I am still quite surprised
that we have an OOM victim still on the queue and balanced pgdat here
because OOM victim didn't have chance to free memory. So somebody else
must have released a lot of memory after OOM.
This patch seems better than the one from Vlastimil. Care to post it
with the full changelog, please?
Attached below (merged with 2/2). I haven't checked that it does fix the
issue, because I don't have the reproducer, so it should be committed
only if Vlastimil approves it.
I agree it's the right fix, thanks a lot. We only have a synthetic
reproducer,
as the real scenario would be hard to trigger reliably. I can test it
later, but
I think it's reasonably clear the patch will help.
I would just personaly keep the comment clarification in the patch, but it's
not a critical issue.
Vlastimil
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>