Hi, On Wed, Feb 19, 2025 at 12:28:16PM +0100, Peter Newman wrote: > Hi Reinette, > > On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre > <reinette.chatre@xxxxxxxxx> wrote: > > > > Hi Peter, > > > > On 2/17/25 2:26 AM, Peter Newman wrote: > > > Hi Reinette, > > > > > > On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre > > > <reinette.chatre@xxxxxxxxx> wrote: [...] > > >> As mentioned above, one possible issue with existing interface is that > > >> it is limited to 26 events (assuming only lower case letters are used). The limit > > >> is low enough to be of concern. > > > > > > The events which can be monitored by a single counter on ABMC and MPAM > > > so far are combinable, so 26 counters per group today means it limits > > > breaking down MBM traffic for each group 26 ways. If a user complained > > > that a 26-way breakdown of a group's MBM traffic was limiting their > > > investigation, I would question whether they know what they're looking > > > for. > > > > The key here is "so far" as well as the focus on MBM only. > > > > It is impossible for me to predict what we will see in a couple of years > > from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface > > to support their users. Just looking at the Intel RDT spec the event register > > has space for 32 events for each "CPU agent" resource. That does not take into > > account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned > > that he is working on patches [1] that will add new events and shared the idea > > that we may be trending to support "perf" like events associated with RMID. I > > expect AMD PQoS and Arm MPAM to provide related enhancements to support their > > customers. > > This all makes me think that resctrl should be ready to support more events than 26. > > I was thinking of the letters as representing a reusable, user-defined > event-set for applying to a single counter rather than as individual > events, since MPAM and ABMC allow us to choose the set of events each > one counts. Wherever we define the letters, we could use more symbolic > event names. > > In the letters as events model, choosing the events assigned to a > group wouldn't be enough information, since we would want to control > which events should share a counter and which should be counted by > separate counters. I think the amount of information that would need > to be encoded into mbm_assign_control to represent the level of > configurability supported by hardware would quickly get out of hand. > > Maybe as an example, one counter for all reads, one counter for all > writes in ABMC would look like... > > (L3_QOS_ABMC_CFG.BwType field names below) > > (per domain) > group 0: > counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill > counter 1: VictimBW,LclNTWr,RmtNTWr > group 1: > counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill > counter 3: VictimBW,LclNTWr,RmtNTWr > ... > > I assume packing all of this info for a group's desired counter > configuration into a single line (with 32 domains per line on many > dual-socket AMD configurations I see) would be difficult to look at, > even if we could settle on a single letter to represent each > universally. > > > > > My goal is for resctrl to have a user interface that can as much as possible > > be ready for whatever may be required from it years down the line. Of course, > > I may be wrong and resctrl would never need to support more than 26 events per > > resource (*). The risk is that resctrl *may* need to support more than 26 events > > and how could resctrl support that? > > > > What is the risk of supporting more than 26 events? As I highlighted earlier > > the interface I used as demonstration may become unwieldy to parse on a system > > with many domains that supports many events. This is a concern for me. Any suggestions > > will be appreciated, especially from you since I know that you are very familiar with > > issues related to large scale use of resctrl interfaces. > > It's mainly just the unwieldiness of all the information in one file. > It's already at the limit of what I can visually look through. > > I believe that shared assignments will take care of all the > high-frequency and performance-intensive batch configuration updates I > was originally concerned about, so I no longer see much benefit in > finding ways to textually encode all this information in a single file > when it would be more manageable to distribute it around the > filesystem hierarchy. > > -Peter This was sort of what I had in my mind. I think it may make some sense to support "t" and "l" out of the box, as intuitively backwards-compatible event names, but provide a way to create new "letters" as needed, with well-defined way (customisable or not) of mapping these to event names visible in resctrlfs. I just used the digits for this purpose, but we could have an explicit interface for it. In order for this series to stabilise though, does it make sense to put this out of scope just for now? The current series provides a way to provide the mbm_total_bytes and mbm_local_bytes counters on AMBC and MPAM systems, without having to limit the total number of monitoring groups (MPAM's current approach) or overcommit the counters so that they may not be continuously reliable when there are too many groups (AMD?). That seems immediately useful. The ability to assign arbitrarily many counters to a group is a new feature however. Does it make sense to consider this on its own merits when the baseline ABMC interface has been settled? May main concern right now (from the Arm side) is to be confident that the initial ABMC interface definition doesn't paint us into a corner. Cheers ---Dave