On Wed, Mar 11, 2020 at 05:59:53PM +0100, Arnd Bergmann wrote: > On Wed, Mar 11, 2020 at 3:29 PM Catalin Marinas <catalin.marinas@xxxxxxx> wrote: > > > > - Flip TTBR0 on kernel entry/exit, and again during user access. > > > > > > This is probably more work to implement than your idea, but > > > I would hope this has a lower overhead on most microarchitectures > > > as it doesn't require pinning the pages. Depending on the > > > microarchitecture, I'd hope the overhead would be comparable > > > to that of ARM64_SW_TTBR0_PAN. > > > > This still doesn't solve the copy_{from,to}_user() case where both > > address spaces need to be available during copy. So you either pin the > > user pages in memory and access them via the kernel mapping or you > > temporarily map (kmap?) the destination/source kernel address. The > > overhead I'd expect to be significantly greater than ARM64_SW_TTBR0_PAN > > for the uaccess routines. For user entry/exit, your suggestion is > > probably comparable with SW PAN. > > Good point, that is indeed a larger overhead. The simplest implementation > I had in mind would use the code from arch/arm/lib/copy_from_user.S and > flip ttbr0 between each ldm and stm (up to 32 bytes), but I have no idea > of the cost of storing to ttbr0, so this might be even more expensive. Do you > have an estimate of how long writing to TTBR0_64 takes on Cortex-A7 > and A15, respectively? I don't have numbers but it's usually not cheap since you need an ISB to synchronise the context after TTBR0 update (basically flushing the pipeline). > Another way might be to use a use a temporary buffer that is already > mapped, and add a memcpy() through L1-cache to reduce the number > of ttbr0 changes. The buffer would probably have to be on the stack, > which limits the size, but for large copies get_user_pages()+memcpy() > may end up being faster anyway. IIRC, the x86 attempt from Ingo some years ago was using get_user_pages() for uaccess. Depending on the size of the buffer, this may be faster than copying twice. -- Catalin