Thanks Peter! > On Oct 14, 2019, at 2:09 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Thu, Oct 10, 2019 at 06:06:14PM +0000, Alexei Starovoitov wrote: >> On 10/10/19 10:46 AM, Peter Zijlstra wrote: > >>> All of stack_map_get_build_id_offset() is just disguisting games; I did >>> tell you guys how to do lockless vma lookups a few years ago -- and yes, >>> that is invasive core mm surgery. But this is just disguisting hacks for >>> not wanting to do it right. >> >> you mean speculative page fault stuff? >> That was my hope as well and I offered Laurent all the help to land it. >> Yet after a year since we've talked the patches are not any closer >> to landing. >> Any other 'invasive mm surgery' you have in mind? > > Indeed that series. It had RCU managed VMAs and lockless VMA lookups, > which is exactly what you need here. Lockless VMA lookups will be really useful. It would resolve all the pains we are having here. I remember Google folks also mentioned in LPC that they would like better mechanism to confirm build-id in perf. > >>> Basically the only semi-sane thing to do with that trainwreck is >>> s/in_nmi()/true/ and pray. >>> >>> On top of that I just hate buildids in general. >> >> Emotions aside... build_id is useful and used in production. >> It's used widely because it solves real problems. > > AFAIU it solves the problem of you not knowing what version of the > binary runs where; which I was hoping your cloud infrastructure thing > would actually know already. > > Anyway, I know what it does, I just don't nessecarily agree it is the > right way around that particular problem (also, the way I'm personally > affected is that perf-record is dead slow by default due to built-id > post processing). > > And it obviously leads to horrible hacks like the code currently under > discussion :/ > >> This dead lock is from real servers and not from some sanitizer wannabe. > > If you enable CFS bandwidth control and run this function on the > trace_hrtimer_start() tracepoint, you should be able to trigger a real > AB-BA lockup. > >> Hence we need to fix it as cleanly as possible and quickly. >> s/in_nmi/true/ is certainly an option. > > That is the best option; because tracepoints / perf-overflow handlers > really should not be taking any locks. > >> I'm worried about overhead of doing irq_work_queue() all the time. >> But I'm not familiar with mechanism enough to justify the concerns. >> Would it make sense to do s/in_nmi/irgs_disabled/ instead? > > irqs_disabled() should work in this particular case because rq->lock > (and therefore all it's nested locks) are IRQ-safe. We worry about the overhead of irq_work for every single stackmap lookup. So we would like to go with the irqs_disabled() check. I just sent v2 of the patch. Thanks again, Song