On Wed, Apr 17, 2024 at 09:45:34AM +0000, Shameerali Kolothum Thodi wrote: > Just to add to that. One idea could be like to have a case where when ECMDQs are > detected, use that for issuing limited set of cmds(like stage 1 TLBIs) and use the > normal cmdq for rest. Since we use stage 1 for both host and for Guest nested cases > and TLBIs are the bottlenecks in most cases I think this should give performance > benefits. There is definately options to look at to improve the performance here. IMHO the design of the ECMDQ largely seems to expect 1 queue per-cpu and then we move to a lock-less design where each CPU uses it's own private per-cpu queue. In this case a VMM calling the kernel to do invalidation would often naturally use a thread originating on a pCPU bound to a vCPU which is substantially exclusive to the VM. Jason