Hi Peter/Reinette, On 2/26/25 10:25, Reinette Chatre wrote: > Hi Peter, > > On 2/26/25 5:27 AM, Peter Newman wrote: >> Hi Babu, >> >> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@xxxxxxx> wrote: >>> >>> Hi Peter, >>> >>> On 2/25/25 11:11, Peter Newman wrote: >>>> Hi Reinette, >>>> >>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre >>>> <reinette.chatre@xxxxxxxxx> wrote: >>>>> >>>>> Hi Peter, >>>>> >>>>> On 2/21/25 5:12 AM, Peter Newman wrote: >>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre >>>>>> <reinette.chatre@xxxxxxxxx> wrote: >>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote: >>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre >>>>>>>> <reinette.chatre@xxxxxxxxx> wrote: >>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote: >>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre >>>>>>>>>> <reinette.chatre@xxxxxxxxx> wrote: >>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote: >>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre >>>>>>>>>>>> <reinette.chatre@xxxxxxxxx> wrote: >>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote: >>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote: >>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote: >>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote: >>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote: >>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax) >>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters. >>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface. >>>>>>>>>>>>>>>>> Please help me understand if you see it differently. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events, >>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> mbm_local_read_bytes a >>>>>>>>>>>>>>>>> mbm_local_write_bytes b >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as: >>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control >>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes >>>>>>>>>>>>>>>>> <value> >>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes >>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available), >>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined >>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit? >>>>>>>>>>>>> >>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that >>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit >>>>>>>>>>>>> is low enough to be of concern. >>>>>>>>>>>> >>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM >>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits >>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained >>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their >>>>>>>>>>>> investigation, I would question whether they know what they're looking >>>>>>>>>>>> for. >>>>>>>>>>> >>>>>>>>>>> The key here is "so far" as well as the focus on MBM only. >>>>>>>>>>> >>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years >>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface >>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register >>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into >>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned >>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea >>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I >>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their >>>>>>>>>>> customers. >>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26. >>>>>>>>>> >>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined >>>>>>>>>> event-set for applying to a single counter rather than as individual >>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each >>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic >>>>>>>>>> event names. >>>>>>>>> >>>>>>>>> Thank you for clarifying. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> In the letters as events model, choosing the events assigned to a >>>>>>>>>> group wouldn't be enough information, since we would want to control >>>>>>>>>> which events should share a counter and which should be counted by >>>>>>>>>> separate counters. I think the amount of information that would need >>>>>>>>>> to be encoded into mbm_assign_control to represent the level of >>>>>>>>>> configurability supported by hardware would quickly get out of hand. >>>>>>>>>> >>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all >>>>>>>>>> writes in ABMC would look like... >>>>>>>>>> >>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below) >>>>>>>>>> >>>>>>>>>> (per domain) >>>>>>>>>> group 0: >>>>>>>>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>>>>>>>> counter 1: VictimBW,LclNTWr,RmtNTWr >>>>>>>>>> group 1: >>>>>>>>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>>>>>>>> counter 3: VictimBW,LclNTWr,RmtNTWr >>>>>>>>>> ... >>>>>>>>>> >>>>>>>>> >>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that >>>>>>>>> example and above the counter configuration appears to be global. You do mention >>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter >>>>>>>>> configuration is a requirement? >>>>>>>> >>>>>>>> If it's global and we want a particular group to be watched by more >>>>>>>> counters, I wouldn't want this to result in allocating more counters >>>>>>>> for that group in all domains, or allocating counters in domains where >>>>>>>> they're not needed. I want to encourage my users to avoid allocating >>>>>>>> monitoring resources in domains where a job is not allowed to run so >>>>>>>> there's less pressure on the counters. >>>>>>>> >>>>>>>> In Dave's proposal it looks like global configuration means >>>>>>>> globally-defined "named counter configurations", which works because >>>>>>>> it's really per-domain assignment of the configurations to however >>>>>>>> many counters the group needs in each domain. >>>>>>> >>>>>>> I think I am becoming lost. Would a global configuration not break your >>>>>>> view of "event-set applied to a single counter"? If a counter is configured >>>>>>> globally then it would not make it possible to support the full configurability >>>>>>> of the hardware. >>>>>>> Before I add more confusion, let me try with an example that builds on your >>>>>>> earlier example copied below: >>>>>>> >>>>>>>>>> (per domain) >>>>>>>>>> group 0: >>>>>>>>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>>>>>>>> counter 1: VictimBW,LclNTWr,RmtNTWr >>>>>>>>>> group 1: >>>>>>>>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>>>>>>>> counter 3: VictimBW,LclNTWr,RmtNTWr >>>>>>>>>> ... >>>>>>> >>>>>>> Since the above states "per domain" I rewrite the example to highlight that as >>>>>>> I understand it: >>>>>>> >>>>>>> group 0: >>>>>>> domain 0: >>>>>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>>>>> counter 1: VictimBW,LclNTWr,RmtNTWr >>>>>>> domain 1: >>>>>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>>>>> counter 1: VictimBW,LclNTWr,RmtNTWr >>>>>>> group 1: >>>>>>> domain 0: >>>>>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>>>>> counter 3: VictimBW,LclNTWr,RmtNTWr >>>>>>> domain 1: >>>>>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>>>>> counter 3: VictimBW,LclNTWr,RmtNTWr >>>>>>> >>>>>>> You mention that you do not want counters to be allocated in domains that they >>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1 >>>>>>> in domain 1, resulting in: >>>>>>> >>>>>>> group 0: >>>>>>> domain 0: >>>>>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>>>>> counter 1: VictimBW,LclNTWr,RmtNTWr >>>>>>> group 1: >>>>>>> domain 0: >>>>>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>>>>> counter 3: VictimBW,LclNTWr,RmtNTWr >>>>>>> domain 1: >>>>>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>>>>> counter 3: VictimBW,LclNTWr,RmtNTWr >>>>>>> >>>>>>> With counter 0 and counter 1 available in domain 1, these counters could >>>>>>> theoretically be configured to give group 1 more data in domain 1: >>>>>>> >>>>>>> group 0: >>>>>>> domain 0: >>>>>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>>>>> counter 1: VictimBW,LclNTWr,RmtNTWr >>>>>>> group 1: >>>>>>> domain 0: >>>>>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>>>>> counter 3: VictimBW,LclNTWr,RmtNTWr >>>>>>> domain 1: >>>>>>> counter 0: LclFill,RmtFill >>>>>>> counter 1: LclNTWr,RmtNTWr >>>>>>> counter 2: LclSlowFill,RmtSlowFill >>>>>>> counter 3: VictimBW >>>>>>> >>>>>>> The counters are shown with different per-domain configurations that seems to >>>>>>> match with earlier goals of (a) choose events counted by each counter and >>>>>>> (b) do not allocate counters in domains where they are not needed. As I >>>>>>> understand the above does contradict global counter configuration though. >>>>>>> Or do you mean that only the *name* of the counter is global and then >>>>>>> that it is reconfigured as part of every assignment? >>>>>> >>>>>> Yes, I meant only the *name* is global. I assume based on a particular >>>>>> system configuration, the user will settle on a handful of useful >>>>>> groupings to count. >>>>>> >>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example... >>>>>> >>>>>> # define global configurations (in ABMC terms), not necessarily in this >>>>>> # syntax and probably not in the mbm_assign_control file. >>>>>> >>>>>> r=LclFill,RmtFill,LclSlowFill,RmtSlowFill >>>>>> w=VictimBW,LclNTWr,RmtNTWr >>>>>> >>>>>> # legacy "total" configuration, effectively r+w >>>>>> t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr >>>>>> >>>>>> /group0/0=t;1=t >>>>>> /group1/0=t;1=t >>>>>> /group2/0=_;1=t >>>>>> /group3/0=rw;1=_ >>>>>> >>>>>> - group2 is restricted to domain 0 >>>>>> - group3 is restricted to domain 1 >>>>>> - the rest are unrestricted >>>>>> - In group3, we decided we need to separate read and write traffic >>>>>> >>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1. >>>>>> >>>>> >>>>> I see. Thank you for the example. >>>>> >>>>> resctrl supports per-domain configurations with the following possible when >>>>> using mbm_total_bytes_config and mbm_local_bytes_config: >>>>> >>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr >>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr >>>>> >>>>> /group0/0=t;1=t >>>>> /group1/0=t;1=t >>>>> >>>>> Even though the flags are identical in all domains, the assigned counters will >>>>> be configured differently in each domain. >>>>> >>>>> With this supported by hardware and currently also supported by resctrl it seems >>>>> reasonable to carry this forward to what will be supported next. >>>> >>>> The hardware supports both a per-domain mode, where all groups in a >>>> domain use the same configurations and are limited to two events per >>>> group and a per-group mode where every group can be configured and >>>> assigned freely. This series is using the legacy counter access mode >>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n >>>> in the domain can be read. If we chose to read the assigned counter >>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC) >>>> rather than asking the hardware to find the counter by RMID, we would >>>> not be limited to 2 counters per group/domain and the hardware would >>>> have the same flexibility as on MPAM. >>> >>> In extended mode, the contents of a specific counter can be read by >>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1, >>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading >>> QM_CTR will then return the contents of the specified counter. >>> >>> It is documented below. >>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf >>> Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC) >>> >>> We previously discussed this with you (off the public list) and I >>> initially proposed the extended assignment mode. >>> >>> Yes, the extended mode allows greater flexibility by enabling multiple >>> counters to be assigned to the same group, rather than being limited to >>> just two. >>> >>> However, the challenge is that we currently lack the necessary interfaces >>> to configure multiple events per group. Without these interfaces, the >>> extended mode is not practical at this time. >>> >>> Therefore, we ultimately agreed to use the legacy mode, as it does not >>> require modifications to the existing interface, allowing us to continue >>> using it as is. >>> >>>> >>>> (I might have said something confusing in my last messages because I >>>> had forgotten that I switched to the extended assignment mode when >>>> prototyping with soft-ABMC and MPAM.) >>>> >>>> Forcing all groups on a domain to share the same 2 counter >>>> configurations would not be acceptable for us, as the example I gave >>>> earlier is one I've already been asked about. >>> >>> I don’t see this as a blocker. It should be considered an extension to the >>> current ABMC series. We can easily build on top of this series once we >>> finalize how to configure the multiple event interface for each group. >> >> I don't think it is, either. Only being able to use ABMC to assign >> counters is fine for our use as an incremental step. My longer-term >> concern is the domain-scoped mbm_total_bytes_config and >> mbm_local_bytes_config files, but they were introduced with BMEC, so >> there's already an expectation that the files are present when BMEC is >> supported. It's good that we at least know about this concern now. Let's take a step back and figure out how we can address it. >> >> On ABMC hardware that also supports BMEC, I'm concerned about enabling >> ABMC when only the BMEC-style event configuration interface exists. > > ABMC currently depends on BMEC making the current implementation the > one you are concerned about? > https://lore.kernel.org/lkml/e4111779ebb0e7004dbedc258eeae2677f578ab1.1737577229.git.babu.moger@xxxxxxx/ I think it is more than that. The ABMC feature allows event configuration by writing to L3_QOS_ABMC_CFG, where we can set cntr_id, RMID, and event configuration. Currently, we derive event configuration from BMEC settings (either mbm_total_bytes_config or mbm_local_bytes_config). If we don’t use BMEC values, we would need to require users to manually specify event configuration settings. struct mbm_cntr_cfg { enum resctrl_event_id evtid; struct rdtgroup *rdtgrp; }; Currently, we determine the RMID from the rdtgroup and the event type, while event configuration relies on BMEC: To make event configuration independent of BMEC, we can include an explicit event configuration field: struct mbm_cntr_cfg { enum resctrl_event_id evtid; u32 evt_cfg; // User-provided config value struct rdtgroup *rdtgrp; }; Key Considerations 1. Counter Management: Managing counters globally (like CLOSID management) would be simpler than handling them at the domain level, though domain-level management is feasible. 2. User Input: Users will need to specify event configuration when assigning events. Here is the quick example using our current interface: a. List the group. #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control //0=t:0x1F,l:0x15;1=t:0x1F,l:0x15 b. Unassign an Event: #echo "//0-l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control //0=t:0x1F;1=t:0x1F,l:0x15 c. Assign an Event: #echo "//0+l:0x15" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control Note that I dont want to rush here. Peter, Can you please spend some time and propose the interface you are thinking of based on both ABMC and MPAM. > >> The scope of my issue is just whether enabling "full" ABMC support >> will require an additional opt-in, since that could remove the BMEC >> interface. If it does, it's something we can live with. > > > Reinette > > -- Thanks Babu Moger