Hi Peter, On 5/1/24 12:48, Peter Newman wrote: > Hi Babu, > > On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@xxxxxxx> wrote: >> >> >> This series adds the support for Assignable Bandwidth Monitoring Counters >> (ABMC). It is also called QoS RMID Pinning feature >> >> The feature details are documented in the APM listed below [1]. >> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming >> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth >> Monitoring (ABMC). The documentation is available at >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 >> >> The patches are based on top of commit >> cd80c2c94699913f9334414189487ff3f93cf0b5 (tip/master) >> >> # Introduction >> >> AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring >> feature only guarantees that RMIDs currently assigned to a processor will >> be tracked by hardware. The counters of any other RMIDs which are no longer >> being tracked will be reset to zero. The MBM event counters return >> "Unavailable" for the RMIDs that are not active. >> >> Users can create 256 or more monitor groups. But there can be only limited >> number of groups that can give guaranteed monitoring numbers. With ever >> changing configurations there is no way to definitely know which of these >> groups will be active for certain point of time. Users do not have the >> option to monitor a group or set of groups for certain period of time >> without worrying about RMID being reset in between. >> >> The ABMC feature provides an option to the user to assign an RMID to the >> hardware counter and monitor the bandwidth for a longer duration. >> The assigned RMID will be active until the user unassigns it manually. >> There is no need to worry about counters being reset during this period. >> Additionally, the user can specify a bitmask identifying the specific >> bandwidth types from the given source to track with the counter. >> >> Without ABMC enabled, monitoring will work in current mode without >> assignment option. >> >> # Linux Implementation >> >> Linux resctrl subsystem provides the interface to count maximum of two >> memory bandwidth events per group, from a combination of available total >> and local events. Keeping the current interface, users can assign a maximum >> of 2 ABMC counters per group. User will also have the option to assign only >> one counter to the group. If the system runs out of assignable ABMC >> counters, kernel will display an error. Users need to unassign an already >> assigned counter to make space for new assignments. >> >> >> # Examples >> >> a. Check if ABMC support is available >> #mount -t resctrl resctrl /sys/fs/resctrl/ >> >> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign >> [abmc] >> legacy_mbm >> >> Linux kernel detected ABMC feature and it is enabled. >> >> b. Check how many ABMC counters are available. >> >> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_cntrs >> 32 >> >> c. Create few resctrl groups. >> >> # mkdir /sys/fs/resctrl/mon_groups/default_mon1 >> # mkdir /sys/fs/resctrl/non_defult_group >> # mkdir /sys/fs/resctrl/non_defult_group/mon_groups/non_default_mon1 >> >> d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> to list and modify the group's assignment states. >> >> The list follows the following format: >> >> * Default CTRL_MON group: >> "//<domain_id>=<assignment_flags>" >> >> * Non-default CTRL_MON group: >> "<CTRL_MON group>//<domain_id>=<assignment_flags>" >> >> * Child MON group of default CTRL_MON group: >> "/<MON group>/<domain_id>=<assignment_flags>" >> >> * Child MON group of non-default CTRL_MON group: >> "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>" >> >> Assignment flags can be one of the following: >> >> t MBM total event is assigned >> l MBM local event is assigned >> tl Both total and local MBM events are assigned >> _ None of the MBM events are assigned >> > > I was able to successfully build a kernel where this interface is > adapted to work with both real ABMC on hardware that supports it and > my software workaround for older hardware. Thanks for trying that out. Good to know. > > My prototype is based on a refactored version of the codebase > supporting MPAM, but the capabilities of the MPAM hardware look > similar enough to ABMC that I'm not concerned about the feasibility. That is good. > > The FS layer is informed by the arch layer (through rdt_resource > fields) how many assignable monitors are available and whether a > monitor is assigned to an entire group or a single event in a group. > Also, the FS layer can assume that monitors are indexed contiguously, > allowing it to host the data structures managing FS-level view of > monitor usage. > > I used the following resctrl_arch-interfaces to propagate assignments > to the implementation: > > void resctrl_arch_assign_monitor(struct rdt_domain *d, u32 mon_id, u32 > closid, u32 rmid, int evtid); Sure. I can add these in next version. Few comments.. AMD does not need closid for assignment. I assume ARM requires closid. What is mon_id here? > void resctrl_arch_unassign_monitor(struct rdt_domain *d, u32 mon_id); We need rmid and evtid for unassign interface here. > > I chose to allow reassigning an assigned monitor without calling > unassign first. This is important when monitors are unassigned and > assigned in a single write to mbm_assign_control, as it allows all > updates to be performed in a single round of parallel IPIs to the > domains. Yes. It is not required to call unassign before assign. Hardware(AMD) supports it. But, we only have 32 counters. We need to know which counter we are going to use for assignment. If all the counters already assigned, then we can't figure out the counter id without calling unassigm first. Using the random counter will overwrite the already assigned counter. > > >> >> g. Users will have the option to go back to legacy_mbm mode if required. >> This can be done using the following command. >> >> # echo "legacy_mbm" > /sys/fs/resctrl/info/L3_MON/mbm_assign >> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign >> abmc >> [legacy_mbm] > > I chose to make this a mount option to simplify the management of the > monitor tracking data structures. They are simply allocated at mount > time and deallocated and unmount. Initially I added it as an mount option. Based on our earlier discussion, we decided to use the assign feature by default if hardware supports it. Users don't have to worry about the details. > > I called the option "mon_assign": The mount option parser calls > resctrl_arch_mon_assign_enable() to determine whether the > implementation supports assignment in some form. If it returns an > error, the mount fails. When successful, the assignable monitor count > is made non-zero in the appropriate rdt_resource, triggering the > behavior change in the FS layer. > > I'm still not sure if it's a good idea to enable monitor assignment by > default. This would be a major disruption in the MBM usage model > triggered by moving software between AMD CPU models. I thought the Why will it be a disruption? Why do you think mount option will solve the problem? As always, there will be option to go back to legacy mode. right? > safest option was to disallow creating more monitoring groups than > monitors unless the option is selected. Given that nobody else Current code allows to create more groups, but it will report "Monitor assignment failed" when it runs out of monitors. > complained about monitoring HW limitations on the mailing list, I > assumed few users create enough monitoring groups to be impacted. > > Thanks! > -Peter -- Thanks Babu Moger