Hi Babu, On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@xxxxxxx> wrote: > > Hi Peter, > > On 2/25/25 11:11, Peter Newman wrote: > > Hi Reinette, > > > > On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre > > <reinette.chatre@xxxxxxxxx> wrote: > >> > >> Hi Peter, > >> > >> On 2/21/25 5:12 AM, Peter Newman wrote: > >>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre > >>> <reinette.chatre@xxxxxxxxx> wrote: > >>>> On 2/20/25 6:53 AM, Peter Newman wrote: > >>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre > >>>>> <reinette.chatre@xxxxxxxxx> wrote: > >>>>>> On 2/19/25 3:28 AM, Peter Newman wrote: > >>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre > >>>>>>> <reinette.chatre@xxxxxxxxx> wrote: > >>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote: > >>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre > >>>>>>>>> <reinette.chatre@xxxxxxxxx> wrote: > >>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote: > >>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote: > >>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote: > >>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote: > >>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote: > >>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote: > >>>>>>>>>> > >>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax) > >>>>>>>>>> > >>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters. > >>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface. > >>>>>>>>>>>>>> Please help me understand if you see it differently. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events, > >>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> mbm_local_read_bytes a > >>>>>>>>>>>>>> mbm_local_write_bytes b > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Then mbm_assign_control can be used as: > >>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control > >>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes > >>>>>>>>>>>>>> <value> > >>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes > >>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available), > >>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined > >>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit? > >>>>>>>>>> > >>>>>>>>>> As mentioned above, one possible issue with existing interface is that > >>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit > >>>>>>>>>> is low enough to be of concern. > >>>>>>>>> > >>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM > >>>>>>>>> so far are combinable, so 26 counters per group today means it limits > >>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained > >>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their > >>>>>>>>> investigation, I would question whether they know what they're looking > >>>>>>>>> for. > >>>>>>>> > >>>>>>>> The key here is "so far" as well as the focus on MBM only. > >>>>>>>> > >>>>>>>> It is impossible for me to predict what we will see in a couple of years > >>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface > >>>>>>>> to support their users. Just looking at the Intel RDT spec the event register > >>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into > >>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned > >>>>>>>> that he is working on patches [1] that will add new events and shared the idea > >>>>>>>> that we may be trending to support "perf" like events associated with RMID. I > >>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their > >>>>>>>> customers. > >>>>>>>> This all makes me think that resctrl should be ready to support more events than 26. > >>>>>>> > >>>>>>> I was thinking of the letters as representing a reusable, user-defined > >>>>>>> event-set for applying to a single counter rather than as individual > >>>>>>> events, since MPAM and ABMC allow us to choose the set of events each > >>>>>>> one counts. Wherever we define the letters, we could use more symbolic > >>>>>>> event names. > >>>>>> > >>>>>> Thank you for clarifying. > >>>>>> > >>>>>>> > >>>>>>> In the letters as events model, choosing the events assigned to a > >>>>>>> group wouldn't be enough information, since we would want to control > >>>>>>> which events should share a counter and which should be counted by > >>>>>>> separate counters. I think the amount of information that would need > >>>>>>> to be encoded into mbm_assign_control to represent the level of > >>>>>>> configurability supported by hardware would quickly get out of hand. > >>>>>>> > >>>>>>> Maybe as an example, one counter for all reads, one counter for all > >>>>>>> writes in ABMC would look like... > >>>>>>> > >>>>>>> (L3_QOS_ABMC_CFG.BwType field names below) > >>>>>>> > >>>>>>> (per domain) > >>>>>>> group 0: > >>>>>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill > >>>>>>> counter 1: VictimBW,LclNTWr,RmtNTWr > >>>>>>> group 1: > >>>>>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill > >>>>>>> counter 3: VictimBW,LclNTWr,RmtNTWr > >>>>>>> ... > >>>>>>> > >>>>>> > >>>>>> I think this may also be what Dave was heading towards in [2] but in that > >>>>>> example and above the counter configuration appears to be global. You do mention > >>>>>> "configurability supported by hardware" so I wonder if per-domain counter > >>>>>> configuration is a requirement? > >>>>> > >>>>> If it's global and we want a particular group to be watched by more > >>>>> counters, I wouldn't want this to result in allocating more counters > >>>>> for that group in all domains, or allocating counters in domains where > >>>>> they're not needed. I want to encourage my users to avoid allocating > >>>>> monitoring resources in domains where a job is not allowed to run so > >>>>> there's less pressure on the counters. > >>>>> > >>>>> In Dave's proposal it looks like global configuration means > >>>>> globally-defined "named counter configurations", which works because > >>>>> it's really per-domain assignment of the configurations to however > >>>>> many counters the group needs in each domain. > >>>> > >>>> I think I am becoming lost. Would a global configuration not break your > >>>> view of "event-set applied to a single counter"? If a counter is configured > >>>> globally then it would not make it possible to support the full configurability > >>>> of the hardware. > >>>> Before I add more confusion, let me try with an example that builds on your > >>>> earlier example copied below: > >>>> > >>>>>>> (per domain) > >>>>>>> group 0: > >>>>>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill > >>>>>>> counter 1: VictimBW,LclNTWr,RmtNTWr > >>>>>>> group 1: > >>>>>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill > >>>>>>> counter 3: VictimBW,LclNTWr,RmtNTWr > >>>>>>> ... > >>>> > >>>> Since the above states "per domain" I rewrite the example to highlight that as > >>>> I understand it: > >>>> > >>>> group 0: > >>>> domain 0: > >>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill > >>>> counter 1: VictimBW,LclNTWr,RmtNTWr > >>>> domain 1: > >>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill > >>>> counter 1: VictimBW,LclNTWr,RmtNTWr > >>>> group 1: > >>>> domain 0: > >>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill > >>>> counter 3: VictimBW,LclNTWr,RmtNTWr > >>>> domain 1: > >>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill > >>>> counter 3: VictimBW,LclNTWr,RmtNTWr > >>>> > >>>> You mention that you do not want counters to be allocated in domains that they > >>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1 > >>>> in domain 1, resulting in: > >>>> > >>>> group 0: > >>>> domain 0: > >>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill > >>>> counter 1: VictimBW,LclNTWr,RmtNTWr > >>>> group 1: > >>>> domain 0: > >>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill > >>>> counter 3: VictimBW,LclNTWr,RmtNTWr > >>>> domain 1: > >>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill > >>>> counter 3: VictimBW,LclNTWr,RmtNTWr > >>>> > >>>> With counter 0 and counter 1 available in domain 1, these counters could > >>>> theoretically be configured to give group 1 more data in domain 1: > >>>> > >>>> group 0: > >>>> domain 0: > >>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill > >>>> counter 1: VictimBW,LclNTWr,RmtNTWr > >>>> group 1: > >>>> domain 0: > >>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill > >>>> counter 3: VictimBW,LclNTWr,RmtNTWr > >>>> domain 1: > >>>> counter 0: LclFill,RmtFill > >>>> counter 1: LclNTWr,RmtNTWr > >>>> counter 2: LclSlowFill,RmtSlowFill > >>>> counter 3: VictimBW > >>>> > >>>> The counters are shown with different per-domain configurations that seems to > >>>> match with earlier goals of (a) choose events counted by each counter and > >>>> (b) do not allocate counters in domains where they are not needed. As I > >>>> understand the above does contradict global counter configuration though. > >>>> Or do you mean that only the *name* of the counter is global and then > >>>> that it is reconfigured as part of every assignment? > >>> > >>> Yes, I meant only the *name* is global. I assume based on a particular > >>> system configuration, the user will settle on a handful of useful > >>> groupings to count. > >>> > >>> Perhaps mbm_assign_control syntax is the clearest way to express an example... > >>> > >>> # define global configurations (in ABMC terms), not necessarily in this > >>> # syntax and probably not in the mbm_assign_control file. > >>> > >>> r=LclFill,RmtFill,LclSlowFill,RmtSlowFill > >>> w=VictimBW,LclNTWr,RmtNTWr > >>> > >>> # legacy "total" configuration, effectively r+w > >>> t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr > >>> > >>> /group0/0=t;1=t > >>> /group1/0=t;1=t > >>> /group2/0=_;1=t > >>> /group3/0=rw;1=_ > >>> > >>> - group2 is restricted to domain 0 > >>> - group3 is restricted to domain 1 > >>> - the rest are unrestricted > >>> - In group3, we decided we need to separate read and write traffic > >>> > >>> This consumes 4 counters in domain 0 and 3 counters in domain 1. > >>> > >> > >> I see. Thank you for the example. > >> > >> resctrl supports per-domain configurations with the following possible when > >> using mbm_total_bytes_config and mbm_local_bytes_config: > >> > >> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr > >> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr > >> > >> /group0/0=t;1=t > >> /group1/0=t;1=t > >> > >> Even though the flags are identical in all domains, the assigned counters will > >> be configured differently in each domain. > >> > >> With this supported by hardware and currently also supported by resctrl it seems > >> reasonable to carry this forward to what will be supported next. > > > > The hardware supports both a per-domain mode, where all groups in a > > domain use the same configurations and are limited to two events per > > group and a per-group mode where every group can be configured and > > assigned freely. This series is using the legacy counter access mode > > where only counters whose BwType matches an instance of QOS_EVT_CFG_n > > in the domain can be read. If we chose to read the assigned counter > > directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC) > > rather than asking the hardware to find the counter by RMID, we would > > not be limited to 2 counters per group/domain and the hardware would > > have the same flexibility as on MPAM. > > In extended mode, the contents of a specific counter can be read by > setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1, > [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading > QM_CTR will then return the contents of the specified counter. > > It is documented below. > https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf > Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC) > > We previously discussed this with you (off the public list) and I > initially proposed the extended assignment mode. > > Yes, the extended mode allows greater flexibility by enabling multiple > counters to be assigned to the same group, rather than being limited to > just two. > > However, the challenge is that we currently lack the necessary interfaces > to configure multiple events per group. Without these interfaces, the > extended mode is not practical at this time. > > Therefore, we ultimately agreed to use the legacy mode, as it does not > require modifications to the existing interface, allowing us to continue > using it as is. > > > > > (I might have said something confusing in my last messages because I > > had forgotten that I switched to the extended assignment mode when > > prototyping with soft-ABMC and MPAM.) > > > > Forcing all groups on a domain to share the same 2 counter > > configurations would not be acceptable for us, as the example I gave > > earlier is one I've already been asked about. > > I don’t see this as a blocker. It should be considered an extension to the > current ABMC series. We can easily build on top of this series once we > finalize how to configure the multiple event interface for each group. I don't think it is, either. Only being able to use ABMC to assign counters is fine for our use as an incremental step. My longer-term concern is the domain-scoped mbm_total_bytes_config and mbm_local_bytes_config files, but they were introduced with BMEC, so there's already an expectation that the files are present when BMEC is supported. On ABMC hardware that also supports BMEC, I'm concerned about enabling ABMC when only the BMEC-style event configuration interface exists. The scope of my issue is just whether enabling "full" ABMC support will require an additional opt-in, since that could remove the BMEC interface. If it does, it's something we can live with. -Peter