I have spent some time trying to come up with a way to reduce the world-switch penalty when we run on the ARMv6 architecture. Background --------------------- The Qualcomm MSM7201A's core is a part of the arm11 family (specifically ARM1136EJ-S). These cores have virtually indexed and physically tagged L1 caches. The TLB on these cores is split in a two level implementation - a MicroTLB and the main TLB. The division is not important, since the main difference is that the main TLB is slightly slower and allows for locking features. Entries in both TLB layers have an ASID field in addition to the virual address used for lookups. A mapping can be either global (nG bit = 0 in the page tables) or belong to a specific ASID. The specific ASID comes from the CP15 when the entry is first placed in the TLB. Goal --------------------- The optimal solution would be to avoid cache flushes and TLB flushes across all world-switches to the guest and even upon return to user space. Linux and ASID --------------------- Linux uses a global mapping for all kernel addresses ( > TASK_SIZE) as the kernel virtual address space is shared across all processes. Further, it associates an ASID for every process (determined by the mm_struct->context->context_id field). The ASID field is 8 bits, which means that when many processes are running and being scheduled often, once in a while the TLBs are flushed and ASIDs are asigned from scratch. KVM --------------------- Since the caches are physically tagged, we have no problems with homonyms. However, some degree of aliasing may occur if we, for example, access guest data from the host kernel. (Aliasing can be helped by using page coloring, but this discipline is probably going to be problematic in the Linux Kernel as we don't control the address space that much). Avoiding TLB flushes is not going to be easy all together. Since the host kernel is mapped globally and the guest kernel uses the same virtual addresses we cannot easily avoid a TLB flush. However, two options present themselves: 1. Live with TLB flushes whenever entering the guest kernel and map only guest user processes with a separate ASID 2. Change the KVM process' page table on the host to use a dedicated ASID for Kernel mappings. (This is a bit intrusive and I'm worried about the attitude from kernel developers and problems with module loading and other remapping in the kernel) It's hard to speculate whether option 2 is going to be worth the effort, but depending on the performance we see, it might be necessary evil. Any thoughts? Best, Christoffer