> > About smp-cache > > =============== > > > > The API design has been discussed heavily in [3]. > > > > Now, smp-cache is implemented as a array integrated in -machine. Though > > -machine currently can't support JSON format, this is the one of the > > directions of future. > > > > An example is as follows: > > > > smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die > > > > "cache" specifies the cache that the properties will be applied on. This > > field is the combination of cache level and cache type. Now it supports > > "l1d" (L1 data cache), "l1i" (L1 instruction cache), "l2" (L2 unified > > cache) and "l3" (L3 unified cache). > > > > "topology" field accepts CPU topology levels including "thread", "core", > > "module", "cluster", "die", "socket", "book", "drawer" and a special > > value "default". > > Looks good; just one thing, does "thread" make sense? I think that it's > almost by definition that threads within a core share all caches, but maybe > I'm missing some hardware configurations. Hi Paolo, merry Christmas. Yes, AFAIK, there's no hardware has thread level cache. I considered the thread case is that it could be used for vCPU scheduling optimization (although I haven't rigorously tested the actual impact). Without CPU affinity, tasks in Linux are generally distributed evenly across different cores (for example, vCPU0 on Core 0, vCPU1 on Core 1). In this case, the thread-level cache settings are closer to the actual situation, with vCPU0 occupying the L1/L2 of Host core0 and vCPU1 occupying the L1/L2 of Host core1. ┌───┐ ┌───┐ vCPU0 vCPU1 │ │ │ │ └───┘ └───┘ ┌┌───┐┌───┐┐ ┌┌───┐┌───┐┐ ││T0 ││T1 ││ ││T2 ││T3 ││ │└───┘└───┘│ │└───┘└───┘│ └────C0────┘ └────C1────┘ The L2 cache topology affects performance, and cluster-aware scheduling feature in the Linux kernel will try to schedule tasks on the same L2 cache. So, in cases like the figure above, setting the L2 cache to be per thread should, in principle, be better. Thanks, Zhao