On Wed, 25 Dec 2024 11:03:42 +0800 Zhao Liu <zhao1.liu@xxxxxxxxx> wrote: > > > About smp-cache > > > =============== > > > > > > The API design has been discussed heavily in [3]. > > > > > > Now, smp-cache is implemented as a array integrated in -machine. > > > Though -machine currently can't support JSON format, this is the > > > one of the directions of future. > > > > > > An example is as follows: > > > > > > smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die > > > > > > "cache" specifies the cache that the properties will be applied > > > on. This field is the combination of cache level and cache type. > > > Now it supports "l1d" (L1 data cache), "l1i" (L1 instruction > > > cache), "l2" (L2 unified cache) and "l3" (L3 unified cache). > > > > > > "topology" field accepts CPU topology levels including "thread", > > > "core", "module", "cluster", "die", "socket", "book", "drawer" > > > and a special value "default". > > > > Looks good; just one thing, does "thread" make sense? I think that > > it's almost by definition that threads within a core share all > > caches, but maybe I'm missing some hardware configurations. > > Hi Paolo, merry Christmas. Yes, AFAIK, there's no hardware has thread > level cache. Hi Zhao and Paolo, While the example looks OK to me, and makes sense. But would be curious to know more scenarios where I can legitimately see benefit there. I am wrestling with this point on ARM too. If I were to have device trees describing caches in a way that threads get their own private caches then this would not be possible to be described via device tree due to spec limitations (+CCed Rob) if I understood correctly. Thanks, Alireza > > I considered the thread case is that it could be used for vCPU > scheduling optimization (although I haven't rigorously tested the > actual impact). Without CPU affinity, tasks in Linux are generally > distributed evenly across different cores (for example, vCPU0 on Core > 0, vCPU1 on Core 1). In this case, the thread-level cache settings > are closer to the actual situation, with vCPU0 occupying the L1/L2 of > Host core0 and vCPU1 occupying the L1/L2 of Host core1. > > > ┌───┐ ┌───┐ > vCPU0 vCPU1 > │ │ │ │ > └───┘ └───┘ > ┌┌───┐┌───┐┐ ┌┌───┐┌───┐┐ > ││T0 ││T1 ││ ││T2 ││T3 ││ > │└───┘└───┘│ │└───┘└───┘│ > └────C0────┘ └────C1────┘ > > > The L2 cache topology affects performance, and cluster-aware > scheduling feature in the Linux kernel will try to schedule tasks on > the same L2 cache. So, in cases like the figure above, setting the L2 > cache to be per thread should, in principle, be better. > > Thanks, > Zhao > >