On Fri, Oct 05, 2018 at 02:54:28PM +1000, David Gibson wrote: > On Fri, Oct 05, 2018 at 02:23:50PM +1000, Paul Mackerras wrote: > > On Fri, Oct 05, 2018 at 02:09:08PM +1000, David Gibson wrote: > > > On Thu, Oct 04, 2018 at 09:56:02PM +1000, Paul Mackerras wrote: > > > > From: Suraj Jitindar Singh <sjitindarsingh@xxxxxxxxx> > > > > > > > > This is only done at level 0, since only level 0 knows which physical > > > > CPU a vcpu is running on. This does for nested guests what L0 already > > > > did for its own guests, which is to flush the TLB on a pCPU when it > > > > goes to run a vCPU there, and there is another vCPU in the same VM > > > > which previously ran on this pCPU and has now started to run on another > > > > pCPU. This is to handle the situation where the other vCPU touched > > > > a mapping, moved to another pCPU and did a tlbiel (local-only tlbie) > > > > on that new pCPU and thus left behind a stale TLB entry on this pCPU. > > > > > > > > This introduces a limit on the the vcpu_token values used in the > > > > H_ENTER_NESTED hcall -- they must now be less than NR_CPUS. > > > > > > This does make the vcpu tokens no longer entirely opaque to the L0. > > > It works for now, because the only L1 is Linux and we know basically > > > how it allocates those tokens. Eventually we probably want some way > > > to either remove this restriction or to advertise the limit to the L1. > > > > Right, we could use something like a hash table and have it be > > basically just as efficient as the array when the set of IDs is dense > > while also handling arbitrary ID values. (We'd have to make sure that > > L1 couldn't trigger unbounded memory consumption in L0, though.) > > Another approach would be to sacifice some performance for L0 > simplicity: when an L1 vCPU changes pCPU, flush all the nested LPIDs > associated with that L1. When an L2 vCPU changes L1 vCPU (and > therefore, indirectly pCPU), the L1 would be responsible for flushing > it. That was one of the approaches I considered initially, but it has complexities that aren't apparent, and it could be quite inefficient for a guest with a lot of nested guests. For a start you have to provide a way for L1 to flush the TLB for another LPID, which guests can't do themselves (it's a hypervisor privileged operation). Then there's the fact that it's not the pCPU where the moving vCPU has moved to that needs the flush, it's the pCPU that it moved from (where presumably something else is now running). All in all, the simplest solution was to have L0 do it, because L0 knows unambiguously the real physical CPU where any given vCPU last ran. Paul.