Hi Peter, On 2/25/25 9:11 AM, Peter Newman wrote: > Hi Reinette, > > On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre > <reinette.chatre@xxxxxxxxx> wrote: >> >> Hi Peter, >> >> On 2/21/25 5:12 AM, Peter Newman wrote: >>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre >>> <reinette.chatre@xxxxxxxxx> wrote: >>>> On 2/20/25 6:53 AM, Peter Newman wrote: >>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre >>>>> <reinette.chatre@xxxxxxxxx> wrote: >>>>>> On 2/19/25 3:28 AM, Peter Newman wrote: >>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre >>>>>>> <reinette.chatre@xxxxxxxxx> wrote: >>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote: >>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre >>>>>>>>> <reinette.chatre@xxxxxxxxx> wrote: >>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote: >>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote: >>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote: >>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote: >>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote: >>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote: >>>>>>>>>> >>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax) >>>>>>>>>> >>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters. >>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface. >>>>>>>>>>>>>> Please help me understand if you see it differently. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events, >>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of: >>>>>>>>>>>>>> >>>>>>>>>>>>>> mbm_local_read_bytes a >>>>>>>>>>>>>> mbm_local_write_bytes b >>>>>>>>>>>>>> >>>>>>>>>>>>>> Then mbm_assign_control can be used as: >>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control >>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes >>>>>>>>>>>>>> <value> >>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes >>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes> >>>>>>>>>>>>>> >>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available), >>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined >>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit? >>>>>>>>>> >>>>>>>>>> As mentioned above, one possible issue with existing interface is that >>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit >>>>>>>>>> is low enough to be of concern. >>>>>>>>> >>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM >>>>>>>>> so far are combinable, so 26 counters per group today means it limits >>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained >>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their >>>>>>>>> investigation, I would question whether they know what they're looking >>>>>>>>> for. >>>>>>>> >>>>>>>> The key here is "so far" as well as the focus on MBM only. >>>>>>>> >>>>>>>> It is impossible for me to predict what we will see in a couple of years >>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface >>>>>>>> to support their users. Just looking at the Intel RDT spec the event register >>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into >>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned >>>>>>>> that he is working on patches [1] that will add new events and shared the idea >>>>>>>> that we may be trending to support "perf" like events associated with RMID. I >>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their >>>>>>>> customers. >>>>>>>> This all makes me think that resctrl should be ready to support more events than 26. >>>>>>> >>>>>>> I was thinking of the letters as representing a reusable, user-defined >>>>>>> event-set for applying to a single counter rather than as individual >>>>>>> events, since MPAM and ABMC allow us to choose the set of events each >>>>>>> one counts. Wherever we define the letters, we could use more symbolic >>>>>>> event names. >>>>>> >>>>>> Thank you for clarifying. >>>>>> >>>>>>> >>>>>>> In the letters as events model, choosing the events assigned to a >>>>>>> group wouldn't be enough information, since we would want to control >>>>>>> which events should share a counter and which should be counted by >>>>>>> separate counters. I think the amount of information that would need >>>>>>> to be encoded into mbm_assign_control to represent the level of >>>>>>> configurability supported by hardware would quickly get out of hand. >>>>>>> >>>>>>> Maybe as an example, one counter for all reads, one counter for all >>>>>>> writes in ABMC would look like... >>>>>>> >>>>>>> (L3_QOS_ABMC_CFG.BwType field names below) >>>>>>> >>>>>>> (per domain) >>>>>>> group 0: >>>>>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>>>>> counter 1: VictimBW,LclNTWr,RmtNTWr >>>>>>> group 1: >>>>>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>>>>> counter 3: VictimBW,LclNTWr,RmtNTWr >>>>>>> ... >>>>>>> >>>>>> >>>>>> I think this may also be what Dave was heading towards in [2] but in that >>>>>> example and above the counter configuration appears to be global. You do mention >>>>>> "configurability supported by hardware" so I wonder if per-domain counter >>>>>> configuration is a requirement? >>>>> >>>>> If it's global and we want a particular group to be watched by more >>>>> counters, I wouldn't want this to result in allocating more counters >>>>> for that group in all domains, or allocating counters in domains where >>>>> they're not needed. I want to encourage my users to avoid allocating >>>>> monitoring resources in domains where a job is not allowed to run so >>>>> there's less pressure on the counters. >>>>> >>>>> In Dave's proposal it looks like global configuration means >>>>> globally-defined "named counter configurations", which works because >>>>> it's really per-domain assignment of the configurations to however >>>>> many counters the group needs in each domain. >>>> >>>> I think I am becoming lost. Would a global configuration not break your >>>> view of "event-set applied to a single counter"? If a counter is configured >>>> globally then it would not make it possible to support the full configurability >>>> of the hardware. >>>> Before I add more confusion, let me try with an example that builds on your >>>> earlier example copied below: >>>> >>>>>>> (per domain) >>>>>>> group 0: >>>>>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>>>>> counter 1: VictimBW,LclNTWr,RmtNTWr >>>>>>> group 1: >>>>>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>>>>> counter 3: VictimBW,LclNTWr,RmtNTWr >>>>>>> ... >>>> >>>> Since the above states "per domain" I rewrite the example to highlight that as >>>> I understand it: >>>> >>>> group 0: >>>> domain 0: >>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>> counter 1: VictimBW,LclNTWr,RmtNTWr >>>> domain 1: >>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>> counter 1: VictimBW,LclNTWr,RmtNTWr >>>> group 1: >>>> domain 0: >>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>> counter 3: VictimBW,LclNTWr,RmtNTWr >>>> domain 1: >>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>> counter 3: VictimBW,LclNTWr,RmtNTWr >>>> >>>> You mention that you do not want counters to be allocated in domains that they >>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1 >>>> in domain 1, resulting in: >>>> >>>> group 0: >>>> domain 0: >>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>> counter 1: VictimBW,LclNTWr,RmtNTWr >>>> group 1: >>>> domain 0: >>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>> counter 3: VictimBW,LclNTWr,RmtNTWr >>>> domain 1: >>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>> counter 3: VictimBW,LclNTWr,RmtNTWr >>>> >>>> With counter 0 and counter 1 available in domain 1, these counters could >>>> theoretically be configured to give group 1 more data in domain 1: >>>> >>>> group 0: >>>> domain 0: >>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>> counter 1: VictimBW,LclNTWr,RmtNTWr >>>> group 1: >>>> domain 0: >>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>> counter 3: VictimBW,LclNTWr,RmtNTWr >>>> domain 1: >>>> counter 0: LclFill,RmtFill >>>> counter 1: LclNTWr,RmtNTWr >>>> counter 2: LclSlowFill,RmtSlowFill >>>> counter 3: VictimBW >>>> >>>> The counters are shown with different per-domain configurations that seems to >>>> match with earlier goals of (a) choose events counted by each counter and >>>> (b) do not allocate counters in domains where they are not needed. As I >>>> understand the above does contradict global counter configuration though. >>>> Or do you mean that only the *name* of the counter is global and then >>>> that it is reconfigured as part of every assignment? >>> >>> Yes, I meant only the *name* is global. I assume based on a particular >>> system configuration, the user will settle on a handful of useful >>> groupings to count. >>> >>> Perhaps mbm_assign_control syntax is the clearest way to express an example... >>> >>> # define global configurations (in ABMC terms), not necessarily in this >>> # syntax and probably not in the mbm_assign_control file. >>> >>> r=LclFill,RmtFill,LclSlowFill,RmtSlowFill >>> w=VictimBW,LclNTWr,RmtNTWr >>> >>> # legacy "total" configuration, effectively r+w >>> t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr >>> >>> /group0/0=t;1=t >>> /group1/0=t;1=t >>> /group2/0=_;1=t >>> /group3/0=rw;1=_ >>> >>> - group2 is restricted to domain 0 >>> - group3 is restricted to domain 1 >>> - the rest are unrestricted >>> - In group3, we decided we need to separate read and write traffic >>> >>> This consumes 4 counters in domain 0 and 3 counters in domain 1. >>> >> >> I see. Thank you for the example. >> >> resctrl supports per-domain configurations with the following possible when >> using mbm_total_bytes_config and mbm_local_bytes_config: >> >> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr >> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr >> >> /group0/0=t;1=t >> /group1/0=t;1=t >> >> Even though the flags are identical in all domains, the assigned counters will >> be configured differently in each domain. >> >> With this supported by hardware and currently also supported by resctrl it seems >> reasonable to carry this forward to what will be supported next. > > The hardware supports both a per-domain mode, where all groups in a > domain use the same configurations and are limited to two events per > group and a per-group mode where every group can be configured and > assigned freely. This series is using the legacy counter access mode > where only counters whose BwType matches an instance of QOS_EVT_CFG_n > in the domain can be read. If we chose to read the assigned counter > directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC) > rather than asking the hardware to find the counter by RMID, we would > not be limited to 2 counters per group/domain and the hardware would > have the same flexibility as on MPAM. > > (I might have said something confusing in my last messages because I > had forgotten that I switched to the extended assignment mode when > prototyping with soft-ABMC and MPAM.) > > Forcing all groups on a domain to share the same 2 counter > configurations would not be acceptable for us, as the example I gave > earlier is one I've already been asked about. I am surprised to hear this at this point of this work. Sounds like we need to go back a couple of steps to determine how to best support user requirements that now includes per-group counter assignment. Have you perhaps looked into how users access the counter data as part of your prototyping? Reinette