On Tue, Mar 12, 2019 at 10:56:20AM +0800, Jason Wang wrote: > > On 2019/3/11 下午9:43, Andrea Arcangeli wrote: > > On Mon, Mar 11, 2019 at 08:48:37AM -0400, Michael S. Tsirkin wrote: > > > Using copyXuser is better I guess. > > It certainly would be faster there, but I don't think it's needed if > > that would be the only use case left that justifies supporting two > > different models. On small 32bit systems with little RAM kmap won't > > perform measurably different on 32bit or 64bit systems. If the 32bit > > host has a lot of ram it all gets slow anyway at accessing RAM above > > the direct mapping, if compared to 64bit host kernels, it's not just > > an issue for vhost + mmu notifier + kmap and the best way to optimize > > things is to run 64bit host kernels. > > > > Like Christoph pointed out, the main use case for retaining the > > copy-user model would be CPUs with virtually indexed not physically > > tagged data caches (they'll still suffer from the spectre-v1 fix, > > although I exclude they have to suffer the SMAP > > slowdown/feature). Those may require some additional flushing than the > > current copy-user model requires. > > > > As a rule of thumb any arch where copy_user_page doesn't define as > > copy_page will require some additional cache flushing after the > > kmap. Supposedly with vmap, the vmap layer should have taken care of > > that (I didn't verify that yet). > > > vmap_page_range()/free_unmap_vmap_area() will call > fluch_cache_vmap()/flush_cache_vunmap(). So vmap layer should be ok. > > Thanks You only unmap from mmu notifier though. You don't do it after any access. > > > > > There are some accessories like copy_to_user_page() > > copy_from_user_page() that could work and obviously defines to raw > > memcpy on x86 (the main cons is they don't provide word granular > > access) and at least on sparc they're tailored to ptrace assumptions > > so then we'd need to evaluate what happens if this is used outside of > > ptrace context. kmap has been used generally either to access whole > > pages (i.e. copy_user_page), so ptrace may actually be the only use > > case with subpage granularity access. > > > > #define copy_to_user_page(vma, page, vaddr, dst, src, len) \ > > do { \ > > flush_cache_page(vma, vaddr, page_to_pfn(page)); \ > > memcpy(dst, src, len); \ > > flush_ptrace_access(vma, page, vaddr, src, len, 0); \ > > } while (0) > > > > So I wouldn't rule out the need for a dual model, until we solve how > > to run this stable on non-x86 arches with not physically tagged > > caches. > > > > Thanks, > > Andrea