On 05/05/2017, 09:57 PM, Linus Torvalds wrote: > On Fri, May 5, 2017 at 5:22 AM, Jiri Slaby <jslaby@xxxxxxx> wrote: >> The DWARF unwinder is in place and ready. So introduce the config option >> to allow users to enable it. It is by default off due to missing >> assembly annotations. > > Who actually ends up using this? Every SUSE user has been using this for almost a decade and we are not about to switch to FP for performance reasons as noted by Jiri Kosina. So SUSE users are going to be exposed to DWARF unwinder for another decade or so at least. Therefore, this is another attempt to make the unwinder (in some form) upstream. Since this was first proposed many years ago, we have been forced to forward-port it over and over and everyone knows what pain it is. So it is nice, that this opened the discussion at least. > Because from the last time we had fancy unwindoers, and all the > problems it caused for oops handling with absolutely _zero_ upsides > ever, I do not ever again want to see fancy unwinders with complex > state machine handling used by the oopsing code. Well, reliable stack-traces with minimal performance impact thanks to out-of-band data is hell good reason in my opinion. > The fact that it gets disabled for KASAN also makes me suspicious. It > basically means that now all the accesses it does are not even > validated. OK, I inclined to disable KASAN when I started cleaning this up for _performance_ reasons. The system was so slow, that the RCU stall or soft-lockup detectors came up complaining. From that time, I measured the bottlenecks and optimized the unwinder so that 1000 iterations of unwinding takes: Before: real 0m1.808s user 0m0.001s sys 0m1.807s After: real 0m0.018s user 0m0.001s sys 0m0.017s So let me check, whether KASAN still has to be disabled globally. I do not think so. OTOH, TBH, I am not sure KASAN can be enabled for dwarf.c, the same as holds now for the rest of the current unwinding: KASAN_SANITIZE_dumpstack.o := n KASAN_SANITIZE_dumpstack_$(BITS).o := n KASAN_SANITIZE_stacktrace.o := n Still, I can let KASAN := y for dwarf.c for testing purposes locally and smoke-test the unwinder. > The fact that the most of the code seems to be disabled for the first > six patches, and then just enabled in the last patch, also seems to > mean that the series also gets no bisection coverage or testing that > the individual patches make any sense. (ie there's a lot of code > inside "CONFIG_DWARF_UNWIND" in the early patches but that config > option cannot even be enabled until the last patch). Correct. This was one big patch previously. I separated that patch into several smaller commits touching different places of the kernel for easier review. It does not make sense to test any of the patches separately except the first. Hence the config option which enables the rest of the series is the last one. I deemed this as one of possible approaches to split patches (I have seen this many times in the past.) I can of course squash this back into a single patch (or two). > We used to have nasty issues with not just missing dwarf info, but > also actively *wrong* dwarf info. Compiler versions that generated > subtly wrong info, because nobody actually really depended on it, and > the people who had tested it seldom did the kinds of things we do in > the kernel (eg inline asms etc). I must admit I am not aware of any issues in this manner during the years. Again, this unwinder is the default in SUSE kernels since ever, so we have been using gcc from at least 3.2 to 7. But see below. > So I'm personally still very suspicious of these things. > > Last time I had huge issues with people also always blaming *anything* > else than that unwinder. It was always "oh, somebody wrote asm without > getting it right". Or "oh, the compiler generated bad tables, it's not > *my* fault that now the kernel oopsing code no longer works". Now we have objtool. My objtool clone: 1) verifies the DWARF data (prepared by Josh) 2) generates DWARF data for assembly -- incomplete yet: see the thread about x86 assembly cleanup which is a pre-requisite for this to work. This is BTW the reason why the DWARF unwinder is default-off in this series yet. And we can add: 3) fix up the data, if they are wrong That said, objtool could handle the data so they are correct and as-expected for the unwinder. Without objtool, the data (and unwinder) is hopeless (only vdso from all the assembly is annotated.) > When I asked for more stricter debug table validation to avoid issues, > it was always "oh, we fixed it, no worries", and then two months later > somebody hit another issue. Reasonable, indeed. I am all for strict checking. objtool is to do that. > Put another way; the last time we did crazy stuff like this, it got > reverted. For a damn good reason, despite some people being in denial > about those reasons. Speaking for myself, having it out-of-tree causes me only troubles with fwd-porting. So I am all ears to find a path to make this upstream and maintain this there according to opinions of general kernel-community. (Which reminds me I didn't add an entry to the MAINTAINERS file.) thanks, -- js suse labs -- To unsubscribe from this list: send the line "unsubscribe live-patching" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html