----- On Apr 25, 2018, at 7:13 PM, Joel Fernandes joelaf@xxxxxxxxxx wrote: > Hi Mathieu, > > On Wed, Apr 25, 2018 at 2:40 PM, Mathieu Desnoyers > <mathieu.desnoyers@xxxxxxxxxxxx> wrote: >> ----- On Apr 25, 2018, at 5:27 PM, Joel Fernandes joelaf@xxxxxxxxxx wrote: >> >>> On Tue, Apr 24, 2018 at 9:20 PM, Paul E. McKenney >>> <paulmck@xxxxxxxxxxxxxxxxxx> wrote: >>> [..] >>>>> > >>>>> > Sounds good, thanks. >>>>> > >>>>> > Also I found the reason for my boot issue. It was because the >>>>> > init_srcu_struct in the prototype was being done in an initcall. >>>>> > Instead if I do it in start_kernel before the tracepoint is used, it >>>>> > fixes it (although I don't know if this is dangerous to do like this >>>>> > but I can get it to boot atleast.. Let me know if this isn't the >>>>> > right way to do it, or if something else could go wrong) >>>>> > >>>>> > diff --git a/init/main.c b/init/main.c >>>>> > index 34823072ef9e..ecc88319c6da 100644 >>>>> > --- a/init/main.c >>>>> > +++ b/init/main.c >>>>> > @@ -631,6 +631,7 @@ asmlinkage __visible void __init start_kernel(void) >>>>> > WARN(!irqs_disabled(), "Interrupts were enabled early\n"); >>>>> > early_boot_irqs_disabled = false; >>>>> > >>>>> > + init_srcu_struct(&tracepoint_srcu); >>>>> > lockdep_init_early(); >>>>> > >>>>> > local_irq_enable(); >>>>> > -- >>>>> > >>>>> > I benchmarked it and the performance also looks quite good compared >>>>> > to the rcu tracepoint version. >>>>> > >>>>> > If you, Paul and other think doing the init_srcu_struct like this >>>>> > should be Ok, then I can try to work more on your srcu prototype and >>>>> > roll into my series and post them in the next RFC series (or let me >>>>> > know if you wanted to work your srcu stuff in a separate series..). >>>>> >>>>> That is definitely not what I was expecting, but let's see if it works >>>>> anyway... ;-) >>>>> >>>>> But first, I was instead expecting something like this: >>>>> >>>>> DEFINE_SRCU(tracepoint_srcu); >>>>> >>>>> With this approach, some of the initialization happens at compile time >>>>> and the rest happens at the first call_srcu(). >>>>> >>>>> This will work -only- if the first call_srcu() doesn't happen until after >>>>> workqueue_init_early() has been invoked. Which I believe must have been >>>>> the case in your testing, because otherwise it looks like __call_srcu() >>>>> would have complained bitterly. >>>>> >>>>> On the other hand, if you need to invoke call_srcu() before the call >>>>> to workqueue_init_early(), then you need the patch that I am beating >>>>> into shape. Plus you would need to use DEFINE_SRCU() and to avoid >>>>> invoking init_srcu_struct(). >>>> >>>> And here is the patch. I do not intend to send it upstream unless it >>>> actually proves necessary, and it appears that current SRCU does what >>>> you need. >>>> >>>> You would only need this patch if you wanted to invoke call_srcu() >>>> before workqueue_init_early() was called, which does not seem likely. >>> >>> Cool. So I was chatting with Paul and just to update everyone as well, >>> I tried the DEFINE_SRCU instead of the late init_srcu_struct call and >>> can make it past boot too (thanks Paul!). Also I don't see a reason we >>> need the RCU callback to execute early and its fine if it runs later. >>> >>> Also, I was thinking of introducing a separate trace_*event*_srcu API >>> as a replacement to the _rcuidle API. Then I can make use of it for my >>> tracepoints, and then later can use it for the other tracepoints >>> needing _rcuidle. After that we can finally get rid of the _rcuidle >>> API if there are no other users of it. This is just a rough plan, but >>> let me know if there's any issue with this plan that you can think >>> off. >>> IMO, I believe its simpler if the caller worries about whether it can >>> tolerate if tracepoint probes can block or not, than making it a >>> property of the tracepoint. That would also simplify the patch to >>> introduce srcu and keep the tracepoint creation API simple and less >>> confusing, but let me know if I'm missing something about this. >> >> One problem with your approach is that you can have multiple callers >> for the same tracepoint name, where some could be non-preemptible and >> others blocking. Also, there is then no clear way for the callback > > Shouldn't it be responsibility of the caller to make sure it calls > correct API? So if you're wanting to allow probes to block, then you'd > call trace*blocking, if not then you don't. So the caller side can > just always do the right thing. That's a caller side issue. The issue there is that tracepoint.c has APIs both for instrumentation and for registration of probe providers (callbacks). I want tracepoint.c to provide guarantees that it won't connect incompatible probes and callsites together. > >> >> Regarding the name, I'm OK with having something along the lines of >> trace_*event*_blocking or such. Please don't use "srcu" or other naming >> that is explicitly tied to the underlying mechanism used internally >> however: what we want to convey is that this specific tracepoint probe > > Problem is that _blocking isn't the right word either. In my IRQ trace > point case, it will look something like this then: > > local_irq_disable(); > // IRQs are now off. > trace_irq_disable_blocking(..); > > This wouldn't make sense. What we really want is to use the SRCU > implementation so that its low overhead... > > So it would be something like: > > local_irq_disable(); > // IRQs are now off. > trace_irq_disable_srcu(..); > > I also Ok if, as Paul was saying in his last email, that just for > _rcuidle, we use SRCU so that we don't have to do the rcu_enter_irq > stuff. Or we kill the _rcuidle API completely and use _srcu for those > users instead. We already have 1 implementation specific name anyway > (rcuidle), we're just replacing it with another one. If in the future, > if we want to change that name we can always do so (Also if you will, > correcting the existing already bad naming is a different problem and > we're not making it any worse tbh). Using SRCU rather than the sched-rcu tracepoint synchronization in your use-case it caused by a limitation of sched-rcu: it cannot be efficiently used within idle code. So you don't care about the "can_sleep" property of SRCU. You could event mix SRCU and sched-rcu callsites for the same probe name, and it would be perfectly valid. So even though both "can_sleep" and "rcuidle" caller variants would end up using SRCU under the hood, each can have its own caller API, e.g.: * trace_<event>() -> only non-sleeping probes can register to those. Uses sched-rcu under the hood. * trace_<event>_can_sleep() -> both sleeping and non-sleeping probes can register to those. Uses SRCU under the hood. * trace_<event>_rcuidle() -> only non-sleeping probes can register to those, uses SRCU under the hood. > >> can be preempted and block. The underlying implementation could move to >> a different RCU flavor brand in the future, and it should not impact >> users of the tracepoint APIs. >> >> In order to ensure that probes that may block only register themselves >> to tracepoints that allow blocking, we should introduce new tracepoint >> declaration/definition *and* registration APIs also contain the >> "BLOCKING/blocking" keywords (or such), so we can ensure that a >> tracepoint probe being registered to a "blocking" tracepoint is indeed >> allowed to block. > > I feel this problem you're describing is slightly out of the scope of > the issues we're talking about, I think. Even right now, someone can > write a callback that blocks and then bad things will happen. If I > understand correctly, all callbacks right now will execute in a > preempt disabled section because of rcu_read_lock_sched. So we already > have a problem (without the SRCU changes) that if a callback blocks, > then we'll have hard to diagnose sleeping while atomic issues. Sorry > if I missed your point. The current situation is that no callback whatsoever can sleep. If we introduce an API allowing some callbacks to sleep, I want to make sure we don't end up registering sleepable callbacks to non-preemptible callsites. Considering that the callback can be provided by a kernel module whereas the callsite is within the kernel, having this kind of correctness validation within tracepoint.c appears important. Thanks! Mathieu > > thanks, > > - Joel -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html