On Sun, Jul 21, 2019 at 08:28:05AM -0400, Michael S. Tsirkin wrote: > Hi Paul, others, > > So it seems that vhost needs to call kfree_rcu from an ioctl. My worry > is what happens if userspace starts cycling through lots of these > ioctls. Given we actually use rcu as an optimization, we could just > disable the optimization temporarily - but the question would be how to > detect an excessive rate without working too hard :) . > > I guess we could define as excessive any rate where callback is > outstanding at the time when new structure is allocated. I have very > little understanding of rcu internals - so I wanted to check that the > following more or less implements this heuristic before I spend time > actually testing it. > > Could others pls take a look and let me know? These look good as a way of seeing if there are any outstanding callbacks, but in the case of Tree RCU, call_rcu_outstanding() would almost never return false on a busy system. Here are some alternatives: o RCU uses some pieces of Rao Shoaib kfree_rcu() patches. The idea is to make kfree_rcu() locally buffer requests into batches of (say) 1,000, but processing smaller batches when RCU is idle, or when some smallish amout of time has passed with no more kfree_rcu() request from that CPU. RCU than takes in the batch using not call_rcu(), but rather queue_rcu_work(). The resulting batch of kfree() calls would therefore execute in workqueue context rather than in softirq context, which should be much easier on the system. In theory, this would allow people to use kfree_rcu() without worrying quite so much about overload. It would also not be that hard to implement. o Subsystems vulnerable to user-induced kfree_rcu() flooding use call_rcu() instead of kfree_rcu(). Keep a count of the number of things waiting for a grace period, and when this gets too large, disable the optimization. It will then drain down, at which point the optimization can be re-enabled. But please note that callbacks are -not- guaranteed to run on the CPU that queued them. So yes, you would need a per-CPU counter, but you would need to periodically sum it up to check against the global state. Or keep track of the CPU that did the call_rcu() so that you can atomically decrement in the callback the same counter that was atomically incremented just before the call_rcu(). Or any number of other approaches. Also, the overhead is important. For example, as far as I know, current RCU gracefully handles close(open(...)) in a tight userspace loop. But there might be trouble due to tight userspace loops around lighter-weight operations. So an important question is "Just how fast is your ioctl?" If it takes (say) 100 microseconds to execute, there should be absolutely no problem. On the other hand, if it can execute in 50 nanoseconds, this very likely does need serious attention. Other thoughts? Thanx, Paul > Thanks! > > Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx> > > > diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c > index 477b4eb44af5..067909521d72 100644 > --- a/kernel/rcu/tiny.c > +++ b/kernel/rcu/tiny.c > @@ -125,6 +125,25 @@ void synchronize_rcu(void) > } > EXPORT_SYMBOL_GPL(synchronize_rcu); > > +/* > + * Helpful for rate-limiting kfree_rcu/call_rcu callbacks. > + */ > +bool call_rcu_outstanding(void) > +{ > + unsigned long flags; > + struct rcu_data *rdp; > + bool outstanding; > + > + local_irq_save(flags); > + rdp = this_cpu_ptr(&rcu_data); > + outstanding = rcu_segcblist_empty(&rdp->cblist); > + outstanding = rcu_ctrlblk.donetail != rcu_ctrlblk.curtail; > + local_irq_restore(flags); > + > + return outstanding; > +} > +EXPORT_SYMBOL_GPL(call_rcu_outstanding); > + > /* > * Post an RCU callback to be invoked after the end of an RCU grace > * period. But since we have but one CPU, that would be after any > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index a14e5fbbea46..d4b9d61e637d 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -2482,6 +2482,24 @@ static void rcu_leak_callback(struct rcu_head *rhp) > { > } > > +/* > + * Helpful for rate-limiting kfree_rcu/call_rcu callbacks. > + */ > +bool call_rcu_outstanding(void) > +{ > + unsigned long flags; > + struct rcu_data *rdp; > + bool outstanding; > + > + local_irq_save(flags); > + rdp = this_cpu_ptr(&rcu_data); > + outstanding = rcu_segcblist_empty(&rdp->cblist); > + local_irq_restore(flags); > + > + return outstanding; > +} > +EXPORT_SYMBOL_GPL(call_rcu_outstanding); > + > /* > * Helper function for call_rcu() and friends. The cpu argument will > * normally be -1, indicating "currently running CPU". It may specify