Re: INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and `mem_cgroup_shrink_node`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/30/16 12:43, Donald Buczek wrote:
On 11/30/16 12:09, Michal Hocko wrote:
[CCing Paul]

On Wed 30-11-16 11:28:34, Donald Buczek wrote:
[...]
shrink_active_list gets and releases the spinlock and calls cond_resched(). This should give other tasks a chance to run. Just as an experiment, I'm
trying

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1921,7 +1921,7 @@ static void shrink_active_list(unsigned long
nr_to_scan,
         spin_unlock_irq(&pgdat->lru_lock);

         while (!list_empty(&l_hold)) {
-               cond_resched();
+               cond_resched_rcu_qs();
                 page = lru_to_page(&l_hold);
                 list_del(&page->lru);

and didn't hit a rcu_sched warning for >21 hours uptime now. We'll see.
This is really interesting! Is it possible that the RCU stall detector
is somehow confused?

Wait... 21 hours is not yet a test result.

For the records: We didn't have any stall warnings after 2 days and 20 hours now and so I'm quite confident, that my above patch fixed the problem for v4.8.0. On previous boots the rcu warnings started after 37,0.2,1,2,0.8 hours uptime.

Now I've applied this patch to stable latest (v4.8.11) on another backup machine which suffered even more rcu stalls.

Donald

[...]

--
Donald Buczek
buczek@xxxxxxxxxxxxx
Tel: +49 30 8413 1433

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]