> On May 14, 2019, at 1:25 AM, Alexandre Chartre <alexandre.chartre@xxxxxxxxxx> wrote: > > >> On 5/14/19 9:09 AM, Peter Zijlstra wrote: >>> On Mon, May 13, 2019 at 11:18:41AM -0700, Andy Lutomirski wrote: >>> On Mon, May 13, 2019 at 7:39 AM Alexandre Chartre >>> <alexandre.chartre@xxxxxxxxxx> wrote: >>>> >>>> pcpu_base_addr is already mapped to the KVM address space, but this >>>> represents the first percpu chunk. To access a per-cpu buffer not >>>> allocated in the first chunk, add a function which maps all cpu >>>> buffers corresponding to that per-cpu buffer. >>>> >>>> Also add function to clear page table entries for a percpu buffer. >>>> >>> >>> This needs some kind of clarification so that readers can tell whether >>> you're trying to map all percpu memory or just map a specific >>> variable. In either case, you're making a dubious assumption that >>> percpu memory contains no secrets. >> I'm thinking the per-cpu random pool is a secrit. IOW, it demonstrably >> does contain secrits, invalidating that premise. > > The current code unconditionally maps the entire first percpu chunk > (pcpu_base_addr). So it assumes it doesn't contain any secret. That is > mainly a simplification for the POC because a lot of core information > that we need, for example just to switch mm, are stored there (like > cpu_tlbstate, current_task...). I don’t think you should need any of this. > > If the entire first percpu chunk effectively has secret then we will > need to individually map only buffers we need. The kvm_copy_percpu_mapping() > function is added to copy mapping for a specified percpu buffer, so > this used to map percpu buffers which are not in the first percpu chunk. > > Also note that mapping is constrained by PTE (4K), so mapped buffers > (percpu or not) which do not fill a whole set of pages can leak adjacent > data store on the same pages. > > I would take a different approach: figure out what you need and put it in its own dedicated area, kind of like cpu_entry_area. One nasty issue you’ll have is vmalloc: the kernel stack is in the vmap range, and, if you allow access to vmap memory at all, you’ll need some way to ensure that *unmap* gets propagated. I suspect the right choice is to see if you can avoid using the kernel stack at all in isolated mode. Maybe you could run on the IRQ stack instead.