On Tue, Feb 02, 2021 at 02:33:38PM -0800, Nick Desaulniers wrote: > On Mon, Feb 1, 2021 at 4:02 PM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: > > > > On Mon, Feb 01, 2021 at 03:17:40PM -0800, Nick Desaulniers wrote: > > > On the earlier thread, Julien writes: > > > > > > >> I think most people interested in livepatching are using GCC built > > > >> kernels, but I could be mistaken (althought in the long run, both > > > >> compilers should be supported, and yes, I realize the objtool solution > > > >> currently only would support GCC). > > > > > > Google's production kernels are using livepatching and are built with > > > Clang. Getting similar functionality working for arm64 would be of > > > interest. > > > > Well, that's cool. I had no idea. > > > > I'm curious how they're generating livepatch modules? Because > > kpatch-build doesn't support Clang (AFAIK), and if they're not using > > kpatch-build then there are some traps to look out for. > > Ok, I just met with a bunch of folks that are actively working on > this. Let me intro > Yonghyun Hwang <yonghyun@xxxxxxxxxx> > Pete Swain <swine@xxxxxxxxxx> > who will be the folks on point for this from Google. Nice to meet you all. Adding the live-patching ML sub-thread. > My understanding after some clarifications today is that Google is > currently using a proprietary kernel patching mechanism that developed > around a decade ago, "pre-ksplice Oracle acquisition." But we are > looking to transition to kpatch, and help towards supporting arm64. > Live patching is important for deploying kernel fixes faster than > predetermined scheduled draining of jobs in clusters. > > The first steps for kpatch transition is supporting builds with Clang. > Yonghyun is working on that and my hope is he will have patches for > you for that soon. That would be great! > Curiously, the proprietary mechanism doesn't rely on stack validation. If this proprietary mechanism relies on stack traces, that could problematic. Livepatch originally made the same assumption, but it was shot down quickly: https://lwn.net/Articles/634649/ https://lwn.net/Articles/658333/ > I think that such dependency was questioned on the cover letter > patch's thread as well. Yes, though it's generally agreed that unvalidated compiler-generated unwinder metadata isn't going to be robust enough for kernel live patching. > Maybe there's "some traps to look out for" you're referring to there? The "traps" are more about how the patches are generated. If they're built with source code, like a normal kernel module, you have to be extra careful because of function ABI nastiness. kpatch-build avoids this problem. Unfortunately this still isn't documented. > I'm not privy to the details, though I would guess it has to do with > ensuring kernel threads aren't executing (or planning to return > through) code regions that are trying to be patched/unpatched. Right. There are some good details in Documentation/livepatch/livepatch.rst. > I am curious about frame pointers never being omitted for arm64; is > frame pointer chasing is unreliable in certain contexts? Yes, problematic areas are interrupts, exceptions, inline asm, hand-coded asm. A nice document was recently added in Documentation/livepatch/reliable-stacktrace.rst which covers a lot of this stuff. > The internal functionality has been used heavily in production for > almost a decade, though without it being public or supporting arm64; > I'm not sure precisely how they solve such issues (or how others might > review such an approach). Very impressive to run it in production that long. Their experience and expertise is definitely welcome. > Either way, the dependencies for live patching are less important, so > long as they are toolchain portable. The ability to live patch kernel > images is ___important___ to Google. > > > > Objtool support on arm64 is interesting to me though, because it has > > > found bugs in LLVM codegen. That alone is extremely valuable. But not > > > it's not helpful if it's predicated or tightly coupled to GCC, as this > > > series appears to do. > > > > I agree 100%, if there are actual Clang livepatch users (which it sounds > > like there are) then we should target both compilers. > > Or will be. (Sorry, I didn't know we hadn't completed the transition > to kpatch yet. It is "the opposite side of the house" from where I > work; I literally have 8 bosses, not kidding). > > Though if kpatch moves to requiring GCC plugins for architectures we > use extensively or would like to use more of, that's probably going to > throw a wrench in multiple transition plans. (The fleet's transition > to Clang is done, I'm not worried about that). Hopefully we can just forget the GCC plugin idea. It would be really nice to see some performance numbers for -fno-jump-tables so we can justify doing that instead, at least in the short-term. I'd suspect the difference isn't measurable in the real world. (In the case of GCC+retpolines, it would be a performance improvement.) > > And yes, objtool has been pretty good at finding compiler bugs, so the > > more coverage the better. > > > The idea of rebuilding control flow from binary analysis and using > > > that to find codegen bugs is a really cool idea (novel, even? idk), > > > and I wish we had some analog for userspace binaries that could > > > perform similar checks. > > > > Objtool is generic in many ways -- in fact I recently heard from a PhD > > candidate who used it successfully on another kernel for an ORC > > unwinder. > > That's pretty cool! Reuse outside the initial context is always a > good sign that something was designed right. So basically you're saying objtool is both useful and well-designed. I will quote you on that! -- Josh