> -----Original Message----- > From: Tim Chen [mailto:tim.c.chen@xxxxxxxxxxxxxxx] > Sent: Friday, January 8, 2021 12:17 PM > To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>; > valentin.schneider@xxxxxxx; catalin.marinas@xxxxxxx; will@xxxxxxxxxx; > rjw@xxxxxxxxxxxxx; vincent.guittot@xxxxxxxxxx; lenb@xxxxxxxxxx; > gregkh@xxxxxxxxxxxxxxxxxxx; Jonathan Cameron <jonathan.cameron@xxxxxxxxxx>; > mingo@xxxxxxxxxx; peterz@xxxxxxxxxxxxx; juri.lelli@xxxxxxxxxx; > dietmar.eggemann@xxxxxxx; rostedt@xxxxxxxxxxx; bsegall@xxxxxxxxxx; > mgorman@xxxxxxx; mark.rutland@xxxxxxx; sudeep.holla@xxxxxxx; > aubrey.li@xxxxxxxxxxxxxxx > Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > linux-acpi@xxxxxxxxxxxxxxx; linuxarm@xxxxxxxxxxxxx; xuwei (O) > <xuwei5@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>; tiantao (H) > <tiantao6@xxxxxxxxxxxxx> > Subject: Re: [RFC PATCH v3 0/2] scheduler: expose the topology of clusters and > add cluster scheduler > > > > On 1/6/21 12:30 AM, Barry Song wrote: > > ARM64 server chip Kunpeng 920 has 6 clusters in each NUMA node, and each > > cluster has 4 cpus. All clusters share L3 cache data while each cluster > > has local L3 tag. On the other hand, each cluster will share some > > internal system bus. This means cache is much more affine inside one cluster > > than across clusters. > > > > +-----------------------------------+ +---------+ > > | +------+ +------+ +---------------------------+ | > > | | CPU0 | | cpu1 | | +-----------+ | | > > | +------+ +------+ | | | | | > > | +----+ L3 | | | > > | +------+ +------+ cluster | | tag | | | > > | | CPU2 | | CPU3 | | | | | | > > | +------+ +------+ | +-----------+ | | > > | | | | > > +-----------------------------------+ | | > > +-----------------------------------+ | | > > | +------+ +------+ +--------------------------+ | > > | | | | | | +-----------+ | | > > | +------+ +------+ | | | | | > > | | | L3 | | | > > | +------+ +------+ +----+ tag | | | > > | | | | | | | | | | > > | +------+ +------+ | +-----------+ | | > > | | | | > > +-----------------------------------+ | L3 | > > | data | > > +-----------------------------------+ | | > > | +------+ +------+ | +-----------+ | | > > | | | | | | | | | | > > | +------+ +------+ +----+ L3 | | | > > | | | tag | | | > > | +------+ +------+ | | | | | > > | | | | | ++ +-----------+ | | > > | +------+ +------+ |---------------------------+ | > > +-----------------------------------| | | > > +-----------------------------------| | | > > | +------+ +------+ +---------------------------+ | > > | | | | | | +-----------+ | | > > | +------+ +------+ | | | | | > > | +----+ L3 | | | > > | +------+ +------+ | | tag | | | > > | | | | | | | | | | > > | +------+ +------+ | +-----------+ | | > > | | | | > > +-----------------------------------+ | | > > +-----------------------------------+ | | > > | +------+ +------+ +--------------------------+ | > > | | | | | | +-----------+ | | > > | +------+ +------+ | | | | | > > > > > > There is a similar need for clustering in x86. Some x86 cores could share L2 > caches that > is similar to the cluster in Kupeng 920 (e.g. on Jacobsville there are 6 clusters > of 4 Atom cores, each cluster sharing a separate L2, and 24 cores sharing L3). > Having a sched domain at the L2 cluster helps spread load among > L2 domains. This will reduce L2 cache contention and help with > performance for low to moderate load scenarios. > > The cluster detection mechanism will need > to be based on L2 cache sharing in this case. I suggest making the > cluster detection to be CPU architecture dependent so both ARM64 and x86 use > cases > can be accommodated. > > Attached below are two RFC patches for creating x86 L2 > cache sched domain, sans the idle cpu selection on wake up code. It is > similar enough in concept to Barry's patch that we should have a > single patchset that accommodates both use cases. Hi Tim, Agreed on this. hopefully the RFC v4 I am preparing will cover your case. > > Thanks. > > Tim Thanks Barry