> On Nov 3, 2022, at 1:51 PM, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > On Thu, Nov 03, 2022 at 01:41:43PM +0100, Uladzislau Rezki wrote: >>>>>> /** >>>>>> @@ -3066,10 +3068,12 @@ static void kfree_rcu_work(struct work_struct *work) >>>>>> struct kfree_rcu_cpu_work *krwp; >>>>>> int i, j; >>>>>> >>>>>> - krwp = container_of(to_rcu_work(work), >>>>>> + krwp = container_of(work, >>>>>> struct kfree_rcu_cpu_work, rcu_work); >>>>>> krcp = krwp->krcp; >>>>>> >>>>>> + cond_synchronize_rcu(krwp->gp_snap); >>>>> >>>>> Might this provoke OOMs in case of callback flooding? >>>>> >>>>> An alternative might be something like this: >>>>> >>>>> if (!poll_state_synchronize_rcu(krwp->gp_snap)) { >>>>> queue_rcu_work(system_wq, &krwp->rcu_work); >>>>> return; >>>>> } >>>>> >>>>> Either way gets you a non-lazy callback in the case where a grace >>>>> period has not yet elapsed. >>>>> Or am I missing something that prevents OOMs here? >>>> >>>> The memory consumptions appears to be much less in his testing with the onslaught of kfree, which makes OOM probably less likely. >>>> >>>> Though, was your reasoning that in case of a grace period not elapsing, we need a non lazy callback queued, so as to make the reclaim happen sooner? >>>> >>>> If so, the cond_synchronize_rcu() should already be conditionally queueing non-lazy CB since we don’t make synchronous users wait for seconds. Or did I miss something? >>> >>> My concern is that the synchronize_rcu() will block a kworker kthread >>> for some time, and that in callback-flood situations this might slow >>> things down due to exhausting the supply of kworkers. >>> >> This concern works in both cases. I mean in default configuration and >> with a posted patch. The reclaim work, which name is kfree_rcu_work() only >> does a progress when a gp is passed so the rcu_work_rcufn() can queue >> our reclaim kworker. >> >> As it is now: >> >> 1. Collect pointers, then we decide to drop them we queue the >> monitro_work() worker to the system_wq. >> >> 2. The monitor work, kfree_rcu_work(), tries to attach or saying >> it by another words bypass a "backlog" to "free" channels. >> >> 3. It invokes the queue_rcu_work() that does call_rcu_flush() and >> in its turn it queues our worker from the handler. So the worker >> is run after GP is passed. > > So as it is now, we are not tying up a kworker kthread while waiting > for the grace period, correct? We instead have an RCU callback queued > during that time, and the kworker kthread gets involved only after the > grace period ends. > >> With a patch: >> >> [1] and [2] steps are the same. But on third step we do: >> >> 1. Record the GP status for last in channel; >> 2. Directly queue the drain work without any call_rcu() helpers; >> 3. On the reclaim worker entry we check if GP is passed; >> 4. If not it invokes synchronize_rcu(). > > And #4 changes that, by (sometimes) tying up a kworker kthread for the > full grace period. > >> The patch eliminates extra steps by not going via RCU-core route >> instead it directly invokes the reclaim worker where it either >> proceed or wait a GP if needed. > > I agree that the use of the polled API could be reducing delays, which > is a good thing. Just being my usual greedy self and asking "Why not > both?", that is use queue_rcu_work() instead of synchronize_rcu() in > conjunction with the polled APIs so as to avoid both the grace-period > delay and the tying up of the kworker kthread. > > Or am I missing something here? Yeah I am with Paul on this, NAK on “blocking in kworker” instead of “checking for grace period + queuing either regular work or RCU work”. Note that blocking also adds a pointless and fully avoidable scheduler round trip. - Joel > > Thanx, Paul