This series adds the support for Assignable Bandwidth Monitoring Counters (ABMC). It is also called QoS RMID Pinning feature The feature details are documented in the APM listed below [1]. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth Monitoring (ABMC). The documentation is available at Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 The patches are based on top of commit cd80c2c94699913f9334414189487ff3f93cf0b5 (tip/master) # Introduction AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring feature only guarantees that RMIDs currently assigned to a processor will be tracked by hardware. The counters of any other RMIDs which are no longer being tracked will be reset to zero. The MBM event counters return "Unavailable" for the RMIDs that are not active. Users can create 256 or more monitor groups. But there can be only limited number of groups that can give guaranteed monitoring numbers. With ever changing configurations there is no way to definitely know which of these groups will be active for certain point of time. Users do not have the option to monitor a group or set of groups for certain period of time without worrying about RMID being reset in between. The ABMC feature provides an option to the user to assign an RMID to the hardware counter and monitor the bandwidth for a longer duration. The assigned RMID will be active until the user unassigns it manually. There is no need to worry about counters being reset during this period. Additionally, the user can specify a bitmask identifying the specific bandwidth types from the given source to track with the counter. Without ABMC enabled, monitoring will work in current mode without assignment option. # Linux Implementation Linux resctrl subsystem provides the interface to count maximum of two memory bandwidth events per group, from a combination of available total and local events. Keeping the current interface, users can assign a maximum of 2 ABMC counters per group. User will also have the option to assign only one counter to the group. If the system runs out of assignable ABMC counters, kernel will display an error. Users need to unassign an already assigned counter to make space for new assignments. # Examples a. Check if ABMC support is available #mount -t resctrl resctrl /sys/fs/resctrl/ #cat /sys/fs/resctrl/info/L3_MON/mbm_assign [abmc] legacy_mbm Linux kernel detected ABMC feature and it is enabled. b. Check how many ABMC counters are available. #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_cntrs 32 c. Create few resctrl groups. # mkdir /sys/fs/resctrl/mon_groups/default_mon1 # mkdir /sys/fs/resctrl/non_defult_group # mkdir /sys/fs/resctrl/non_defult_group/mon_groups/non_default_mon1 d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control to list and modify the group's assignment states. The list follows the following format: * Default CTRL_MON group: "//<domain_id>=<assignment_flags>" * Non-default CTRL_MON group: "<CTRL_MON group>//<domain_id>=<assignment_flags>" * Child MON group of default CTRL_MON group: "/<MON group>/<domain_id>=<assignment_flags>" * Child MON group of non-default CTRL_MON group: "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>" Assignment flags can be one of the following: t MBM total event is assigned l MBM local event is assigned tl Both total and local MBM events are assigned _ None of the MBM events are assigned Examples: # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; There are four groups and all the groups have local and total event assigned. "//" - This is a default CONTROL MON group "non_defult_group//" - This is non default CONTROL MON group "/default_mon1/" - This is Child MON group of the defult group "non_defult_group/non_default_mon1/" - This is child MON group of the non default group =tl means both total and local events are assigned. e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control. The write format is similar to the above list format with addition of op-code for the assignment operation. * Default CTRL_MON group: "//<domain_id><op-code><assignment_flags>" * Non-default CTRL_MON group: "<CTRL_MON group>//<domain_id><op-code><assignment_flags>" * Child MON group of default CTRL_MON group: "/<MON group>/<domain_id><op-code><assignment_flags>" * Child MON group of non-default CTRL_MON group: "<CTRL_MON group>/<MON group>/<domain_id><op-code><assignment_flags>" Op-code can be one of the following: = Update the assignment to match the flags + Assign a new state - Unassign a new state _ Unassign all the states Initial group status: # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control non_default_ctrl_mon_grp//0=tl;1=tl; non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; //0=tl;1=tl; /child_default_mon_grp/0=tl;1=tl; To update the default group to assign only total event. # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control Assignment status after the update: # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control non_default_ctrl_mon_grp//0=tl;1=tl; non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; //0=t;1=t; /child_default_mon_grp/0=tl;1=tl; To update the MON group child_default_mon_grp to remove local event: # echo "/child_default_mon_grp/0-l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control Assignment status after the update: # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control //0=t;1=t; /child_default_mon_grp/0=t;1=t; non_default_ctrl_mon_grp//0=tl;1=tl; non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to remove both local and total events: # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/0_" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control Assignment status after the update: # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control //0=t;1=t; /child_default_mon_grp/0=t;1=t; non_default_ctrl_mon_grp//0=tl;1=tl; non_default_ctrl_mon_grp/child_non_default_mon_grp/0=_;1=_; f. Read the event mbm_total_bytes and mbm_local_bytes of the default group. There is no change in reading the evetns with ABMC. If the event is unassigned when reading, then the read will come back as Unavailable. # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes 779247936 # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 765207488 g. Users will have the option to go back to legacy_mbm mode if required. This can be done using the following command. # echo "legacy_mbm" > /sys/fs/resctrl/info/L3_MON/mbm_assign # cat /sys/fs/resctrl/info/L3_MON/mbm_assign abmc [legacy_mbm] h. Check the bandwidth configuration for the group. Note that bandwidth configuration has a domain scope. Total event defaults to 0x7F (to count all the events) and local event defaults to 0x15 (to count all the local numa events). The event bitmap decoding is available at https://www.kernel.org/doc/Documentation/x86/resctrl.rst in section "mbm_total_bytes_config", "mbm_local_bytes_config": #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 0=0x7f;1=0x7f #cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 0=0x15;1=0x15 j. Change the bandwidth source for domain 0 for the total event to count only reads. Note that this change effects total events on the domain 0. #echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 0=0x33;1=0x7F k. Now read the total event again. The mbm_total_bytes should display only the read events. #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes 314101 l. Unmount the resctrl #umount /sys/fs/resctrl/ --- v3: This series adds the support for global assignment mode discussed in the thread. https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@xxxxxxx/ Removed the individual assignment mode and included the global assignment interface. Added following interface files. a. /sys/fs/resctrl/info/L3_MON/mbm_assign Used for displaying the current assignment mode and switch between ABMC and legacy mode. b. /sys/fs/resctrl/info/L3_MON/mbm_assign_control Used for lising the groups assignment mode and modify the assignment states. c. Most of the changes are related to the new interface. d. Addressed the comments from Reinette, James and Peter. e. Hope I have addressed most of the major feedbacks discussed. If I missed something then it is not intentional. Please feel free to comment. f. Sending this as an RFC as per Reinette's comment. So, this is still open for discussion. v2: a. Major change is the way ABMC is enabled. Earlier, user needed to remount with -o abmc to enable ABMC feature. Removed that option now. Now users can enable ABMC by "$echo 1 to /sys/fs/resctrl/info/L3_MON/mbm_assign_enable". b. Added new word 21 to x86/cpufeatures.h. c. Display unsupported if user attempts to read the events when ABMC is enabled and event is not assigned. d. Display monitor_state as "Unsupported" when ABMC is disabled. e. Text updates and rebase to latest tip tree (as of Jan 18). f. This series is still work in progress. I am yet to hear from ARM developers. v2: https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@xxxxxxx/ v1 : https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@xxxxxxx/ Babu Moger (17): x86/resctrl: Add support for Assignable Bandwidth Monitoring Counters (ABMC) x86/resctrl: Add ABMC feature in the command line options x86/resctrl: Detect Assignable Bandwidth Monitoring feature details x86/resctrl: Introduce resctrl_file_fflags_init x86/resctrl: Introduce the interface to display the assignment state x86/resctrl: Introduce interface to display number of ABMC counters x86/resctrl: Add support to enable/disable ABMC feature x86/resctrl: Initialize assignable counters bitmap x86/resctrl: Introduce assign state for the mon group x86/resctrl: Add data structures for ABMC assignment x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg x86/resctrl: Add the functionality to assign the RMID x86/resctrl: Add the functionality to unassign the RMID x86/resctrl: Enable ABMC by default on resctrl mount x86/resctrl: Introduce the interface switch between ABMC and legacy_mbm x86/resctrl: Introduce interface to list assignment states of all the groups x86/resctrl: Introduce interface to modify assignment states of the groups .../admin-guide/kernel-parameters.txt | 2 +- Documentation/arch/x86/resctrl.rst | 144 ++++ arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/msr-index.h | 2 + arch/x86/kernel/cpu/cpuid-deps.c | 3 + arch/x86/kernel/cpu/resctrl/core.c | 25 +- arch/x86/kernel/cpu/resctrl/internal.h | 56 +- arch/x86/kernel/cpu/resctrl/monitor.c | 24 +- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 714 +++++++++++++++++- arch/x86/kernel/cpu/scattered.c | 1 + include/linux/resctrl.h | 12 + 11 files changed, 964 insertions(+), 20 deletions(-) -- 2.34.1 Babu Moger (17): x86/resctrl: Add support for Assignable Bandwidth Monitoring Counters (ABMC) x86/resctrl: Add ABMC feature in the command line options x86/resctrl: Detect Assignable Bandwidth Monitoring feature details x86/resctrl: Introduce resctrl_file_fflags_init x86/resctrl: Introduce the interface to display the assignment state x86/resctrl: Introduce interface to display number of ABMC counters x86/resctrl: Add support to enable/disable ABMC feature x86/resctrl: Initialize assignable counters bitmap x86/resctrl: Introduce assign state for the mon group x86/resctrl: Add data structures for ABMC assignment x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg x86/resctrl: Add the functionality to assign the RMID x86/resctrl: Add the functionality to unassign the RMID x86/resctrl: Enable ABMC by default on resctrl mount x86/resctrl: Introduce the interface switch between ABMC and legacy_mbm x86/resctrl: Introduce interface to list assignment states of all the groups x86/resctrl: Introduce interface to modify assignment states of the groups .../admin-guide/kernel-parameters.txt | 2 +- Documentation/arch/x86/resctrl.rst | 144 ++++ arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/msr-index.h | 2 + arch/x86/kernel/cpu/cpuid-deps.c | 3 + arch/x86/kernel/cpu/resctrl/core.c | 25 +- arch/x86/kernel/cpu/resctrl/internal.h | 56 +- arch/x86/kernel/cpu/resctrl/monitor.c | 24 +- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 714 +++++++++++++++++- arch/x86/kernel/cpu/scattered.c | 1 + include/linux/resctrl.h | 12 + 11 files changed, 964 insertions(+), 20 deletions(-) -- 2.34.1