On Sun, 26 Aug 2018 20:26:09 -0700 Nadav Amit <nadav.amit@xxxxxxxxx> wrote: > at 8:03 PM, Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote: > > > On Sun, 26 Aug 2018 11:09:58 +0200 > > Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > >> On Sat, Aug 25, 2018 at 09:21:22PM -0700, Andy Lutomirski wrote: > >>> I just re-read text_poke(). It's, um, horrible. Not only is the > >>> implementation overcomplicated and probably buggy, but it's SLOOOOOW. > >>> It's totally the wrong API -- poking one instruction at a time > >>> basically can't be efficient on x86. The API should either poke lots > >>> of instructions at once or should be text_poke_begin(); ...; > >>> text_poke_end();. > >> > >> I don't think anybody ever cared about performance here. Only > >> correctness. That whole text_poke_bp() thing is entirely tricky. > > > > Agreed. Self modification is a special event. > > > >> FWIW, before text_poke_bp(), text_poke() would only be used from > >> stop_machine, so all the other CPUs would be stuck busy-waiting with > >> IRQs disabled. These days, yeah, that's lots more dodgy, but yes > >> text_mutex should be serializing all that. > > > > I'm still not sure that speculative page-table walk can be done > > over the mutex. Also, if the fixmap area is for aliasing > > pages (which always mapped to memory), what kind of > > security issue can happen? > > The PTE is accessible from other cores, so just as we assume for L1TF that > the every addressable memory might be cached in L1, we should assume and > PTE might be cached in the TLB when it is present. Ok, so other cores can accidentally cache the PTE in TLB, (and no way to shoot down explicitly?) > Although the mapping is for an alias, there are a couple of issues here. > First, this alias mapping is writable, so it might an attacker to change the > kernel code (following another initial attack). Combined with some buffer overflow, correct? If the attacker already can write a kernel data directly, he is in the kernel mode. > Second, the alias mapping is > never explicitly flushed. We may assume that once the original mapping is > removed/changed, a full TLB flush would take place, but there is no > guarantee it actually takes place. Hmm, would this means a full TLB flush will not flush alias mapping? (or, the full TLB flush just doesn't work?) > > Anyway, from the viewpoint of kprobes, either per-cpu fixmap or > > changing CR3 sounds good to me. I think we don't even need per-cpu, > > it can call a thread/function on a dedicated core (like the first > > boot processor) and wait :) This may prevent leakage of pte change > > to other cores. > > I implemented per-cpu fixmap, but I think that it makes more sense to take > peterz approach and set an entry in the PGD level. Per-CPU fixmap either > requires to pre-populate various levels in the page-table hierarchy, or > conditionally synchronize whenever module memory is allocated, since they > can share the same PGD, PUD & PMD. While usually the synchronization is not > needed, the possibility that synchronization is needed complicates locking. > Could you point which PeterZ approach you said? I guess it will be make a clone of PGD and use it for local page mapping (as new mm). If so, yes it sounds perfectly fine to me. > Anyhow, having fixed addresses for the fixmap can be used to circumvent > KASLR. I think text_poke doesn't mind using random address :) > I don’t think a dedicated core is needed. Anyhow there is a lock > (text_mutex), so use_mm() can be used after acquiring the mutex. Hmm, use_mm() said; /* * use_mm * Makes the calling kernel thread take on the specified * mm context. * (Note: this routine is intended to be called only * from a kernel thread context) */ So maybe we need a dedicated kernel thread for safeness? Thank you, -- Masami Hiramatsu <mhiramat@xxxxxxxxxx>