> Hello, Joel, Paul. > > > Hi Vlad, Paul, > > > > On Thu, Jun 09, 2022 at 03:10:57PM +0200, Uladzislau Rezki wrote: > > > On Tue, Jun 7, 2022 at 5:47 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > > > > > On Sun, Jun 05, 2022 at 11:10:31AM +0200, Uladzislau Rezki wrote: > > > > > > On Thu, Jun 02, 2022 at 10:06:44AM +0200, Uladzislau Rezki (Sony) wrote: > > > > > > > Currently the monitor work is scheduled with a fixed interval that > > > > > > > is HZ/20 or each 50 milliseconds. The drawback of such approach is > > > > > > > a low utilization of page slot in some scenarios. The page can store > > > > > > > up to 512 records. For example on Android system it can look like: > > > > > > > > > > > > > > <snip> > > > > > > > kworker/3:0-13872 [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1 > > > > > > > kworker/3:0-13872 [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2 > > > > > > > kworker/1:2-20434 [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1 > > > > > > > kworker/1:2-20434 [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2 > > > > > > > kworker/0:1-20052 [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2 > > > > > > > kworker/0:1-20052 [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1 > > > > > > > kworker/2:3-14372 [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2 > > > > > > > kworker/2:3-14372 [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1 > > > > > > > kworker/1:2-20434 [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5 > > > > > > > kworker/2:3-14372 [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5 > > > > > > > kworker/1:2-20434 [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1 > > > > > > > kworker/2:3-14372 [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6 > > > > > > > kworker/3:0-13872 [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3 > > > > > > > kworker/1:2-20434 [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4 > > > > > > > kworker/1:2-20434 [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4 > > > > > > > <snip> > > > > > > > > > > > > > > where a page only carries few records to reclaim a memory. In order to > > > > > > > improve batching and make utilization more efficient the patch introduces > > > > > > > a drain interval that can be set either to KFREE_DRAIN_JIFFIES_MAX or > > > > > > > KFREE_DRAIN_JIFFIES_MIN. It is adjusted if a flood is detected, in this > > > > > > > case a memory reclaim occurs more often whereas in mostly idle cases the > > > > > > > interval is set to its maximum timeout that improves the utilization of > > > > > > > page slots. > > > > > > > > > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx> > > > > > > > > > > > > That does look like a problem well worth solving! > > > > > > > > > > > Agree, better ideas make better final solution :) > > > > > > > > > > > > > > > > > But I am missing one thing. If we are having a callback flood, why do we > > > > > > need a shorter timeout? > > > > > > > > > > > To offload faster, because otherwise we run into classical issue, it is a low > > > > > memory condition state resulting in OOM. > > > > > > > > But doesn't each callback queued during the flood give us an opportunity > > > > to react to the flood? That will be way more fine-grained than any > > > > reasonable timer, right? Or am I missing something? > > > > > > > We can set the timer to zero or to current "jiffies" to initiate the > > > offloading if the > > > page is full. In that sense probably it make sense to propagate those two attr. > > > to user space, so the user can configure min/max drain interval. > > > > > > Or we can only deal with fixed interval exposed via sysfs to control it by user. > > > In that case we can get rid of MIN one and just trigger a timer if the page is > > > full. I think this approach is better. > > > > Yes I also think triggering timer with zero-timeout is better. Can you (Vlad) > > accomplish that by just calling the timer callback inline, instead of queuing > > a timer? I imagine you would just do queue_work() instead of > > queue_delayed_work() in this scenario. > > > > > > I do agree that the action would often need to be indirect to avoid the > > > > memory-allocation-state hassles, but we already can do that, either via > > > > an extremely short-term hrtimer or something like irq-work. > > > > > > > > > > Wouldn't a check on the number of blocks queued be simpler, more direct, > > > > > > and provide faster response to the start of a callback flood? > > > > > > > > > > > I rely on krcp->count because not always we can store the pointer in the page > > > > > slots. We can not allocate a page in the caller context thus we use page-cache > > > > > worker that fills the cache in normal context. While it populates the cache, > > > > > pointers temporary are queued to the linked-list. > > > > > > > > > > Any thoughts? > > > > > > > > There are a great many ways to approach this. One of them is to maintain > > > > a per-CPU free-running counter of kvfree_rcu() calls, and to reset this > > > > counter each jiffy. > > > > > > > > Or am I missing a trick here? > > > > > > > Do you mean to have a per-cpu timer that checks the per-cpu-freed counter > > > and schedule the work when if it is needed? Or i have missed your point? > > > > I think he (Paul) is describing the way 'flood detection' can work similar to how the > > bypass list code is implemented. There he maintains a count which only if > > exceeds a limit, will queue on to the bypass list. > > > OK, i see that. We also do similar thing. We say it is a flood - when a > page becomes full, so it is kind of threshold that we pass. > > > This code: > > > > // If we have advanced to a new jiffy, reset counts to allow > > // moving back from ->nocb_bypass to ->cblist. > > if (j == rdp->nocb_nobypass_last) { > > c = rdp->nocb_nobypass_count + 1; > > } else { > > WRITE_ONCE(rdp->nocb_nobypass_last, j); > > c = rdp->nocb_nobypass_count - nocb_nobypass_lim_per_jiffy; > > if (ULONG_CMP_LT(rdp->nocb_nobypass_count, > > nocb_nobypass_lim_per_jiffy)) > > c = 0; > > else if (c > nocb_nobypass_lim_per_jiffy) > > c = nocb_nobypass_lim_per_jiffy; > > } > > WRITE_ONCE(rdp->nocb_nobypass_count, c); > > > > > > Your (Vlad's) approach OTOH is also fine to me, you check if page is full and > > make that as a 'flood is happening' detector. > > > OK, thank you Joel. I also think, that way we improve batching and utilization > of the page what is actually an intention of the patch in question. > Paul, will you pick this patch? Thanks! -- Uladzislau Rezki