Hi Babu, On 7/3/24 2:48 PM, Babu Moger wrote:
# Linux Implementation Linux resctrl subsystem provides the interface to count maximum of two memory bandwidth events per group, from a combination of available total and local events. Keeping the current interface, users can enable a maximum of 2 ABMC counters per group. User will also have the option to enable only one counter to the group. If the system runs out of assignable ABMC counters, kernel will display an error. Users need to disable an already enabled counter to make space for new assignments.
The implementation appears to be converging on an interface that can be generic enough to be used by other features discussed along the way. "Linux implementation" summary can thus add: Create a generic interface aimed to support user space assignment of scarce counters used for monitoring. First usage of interface is by ABMC with option to expand usage to "soft-RMID" and MPAM counters in future.
# Examples a. Check if ABMC support is available #mount -t resctrl resctrl /sys/fs/resctrl/ #cat /sys/fs/resctrl/info/L3_MON/mbm_mode [abmc] legacy Linux kernel detected ABMC feature and it is enabled.
How about renaming "abmc" to "mbm_cntrs"? This will match the num_mbm_cntrs info file and be the final step to make this generic so that another architecture can more easily support assignining hardware counters without needing to call the feature AMD's "abmc". Expanding on this it may be possible to add a new "sw_mbm_cntrs" feature that will be the "soft-RMID" feature while also reflecting the "mbm_cntrs" name so that when user space enables that feature its properties can be found in "num_mbm_cntrs". The "abmc" kernel parameter remains but that does seem separate from this resctrl fs feature since it is explicitly tied to X86_FEATURE_ABMC surely making it architecture specific.
b. Check how many ABMC counters are available. #cat /sys/fs/resctrl/info/L3_MON/num_cntrs 32
This is now num_mbm_cntrs
c. Create few resctrl groups. # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_control to list and modify the group's monitoring states. File provides single place to list monitoring states of all the resctrl groups. It makes it easier for user space to learn about the counters are used without needing to traverse all the groups thus reducing the number of filesystem calls. The list follows the following format: "<CTRL_MON group>/<MON group>/<domain_id>=<flags>" Format for specific type of groups: * Default CTRL_MON group: "//<domain_id>=<flags>" * Non-default CTRL_MON group: "<CTRL_MON group>//<domain_id>=<flags>" * Child MON group of default CTRL_MON group: "/<MON group>/<domain_id>=<flags>" * Child MON group of non-default CTRL_MON group: "<CTRL_MON group>/<MON group>/<domain_id>=<flags>" Flags can be one of the following: t MBM total event is enabled. l MBM local event is enabled. tl Both total and local MBM events are enabled. _ None of the MBM events are enabled
The language needs to be changed here (and in the many copied places) to be specific about what setting the flag accomplishes. For example, in "legacy" mode user space can be expected to find all events enabled, no? Needing a new feature to set a flag to accomplish something that is possible in legacy mode can thus cause confusion. If I understand the implementation reading "mbm_control" will fail if system is ABMC capable but it is disabled. Why can "mbm_control" not always be displayed to user space? For example, what if "mbm_control" is always available to user space and it can provide specific information to user space. For example: t MBM total event is enabled but may not always be counted. T MBM total event is enabled and being counted. On AMD systems resource groups will have "t" associated with monitor groups when ABMC disabled, "T" when ABMC enabled and a counter assigned. On Intel systems monitor groups will always have "T". For "soft-RMID" the flag could possible continue to be "T"? I am trying to find ways to communicate to user space consistently and clearly and any insights will be appreciated. We really do not want to add this interface and then find that it just causes confusion. It is not quite obvious to me when the new files should be visible and what they should present to the user. "mbm_mode" is now always visible. Should "num_mbm_cntrs" not also always be visible? Right now "num_mbm_cntrs" appears to be only associated to ABMC, should it not also, for example, be the file that "soft-RMID" may use to share how many counters are available? Its contents will thus be dynamic based on which "MBM mode" is active, begging the question, what should it contain when "legacy" mode is enabled, should "num_mbm_cntrs" perhaps show "0" to user space when "legacy" mode is active?
Examples: # cat /sys/fs/resctrl/info/L3_MON/mbm_control non_default_ctrl_mon_grp//0=tl;1=tl; non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; //0=tl;1=tl; /child_default_mon_grp/0=tl;1=tl; There are four groups and all the groups have local and total event enabled on domain 0 and 1.
"local and total event" is vague, can it be made specific with, for example, "local and total MBM events"
=tl means both total and local events are enabled.
Same here (and all copied places in this series)
"//" - This is a default CTRL_MON group "non_default_ctrl_mon_grp//" - This is non-default CTRL_MON group "/child_default_mon_grp/" - This is Child MON group of the defult group
Same typos as in previous version of cover letter.
"non_default_ctrl_mon_grp/child_non_default_mon_grp/" - This is child MON group of the non-default group e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_control. The write format is similar to the above list format with addition of op-code for the assignment operation. * Default CTRL_MON group: "//<domain_id><op-code><flags>" * Non-default CTRL_MON group: "<CTRL_MON group>//<domain_id><op-code><flags>" * Child MON group of default CTRL_MON group: "/<MON group>/<domain_id><op-code><flags>" * Child MON group of non-default CTRL_MON group: "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>" Op-code can be one of the following: = Update the assignment to match the flag. + Assign a new state. - Unassign a new state.
Please be consistent with terminology. Above switches between "flag" and "state" while it then continues below using "event". Also, "Unassign a _new_ state" is unexpected, it should probably be an _existing_ (not "new") state/flag/event?
Flags can be one of the following: t MBM total event. l MBM local event. tl Both total and local MBM events. _ None of the MBM events. Only works with '=' op-code. Initial group status: # cat /sys/fs/resctrl/info/L3_MON/mbm_control non_default_ctrl_mon_grp//0=tl;1=tl; non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; //0=tl;1=tl; /child_default_mon_grp/0=tl;1=tl; To update the default group to enable only total event on domain 0: # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_control Assignment status after the update: # cat /sys/fs/resctrl/info/L3_MON/mbm_control non_default_ctrl_mon_grp//0=tl;1=tl; non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; //0=t;1=tl; /child_default_mon_grp/0=tl;1=tl; To update the MON group child_default_mon_grp to remove total event on domain 1: # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control Assignment status after the update: $ cat /sys/fs/resctrl/info/L3_MON/mbm_control non_default_ctrl_mon_grp//0=tl;1=tl; non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; //0=t;1=tl; /child_default_mon_grp/0=tl;1=l; To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to remove both local and total events on domain 1: # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" > /sys/fs/resctrl/info/L3_MON/mbm_control Assignment status after the update: non_default_ctrl_mon_grp//0=tl;1=tl; non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_; //0=t;1=tl; /child_default_mon_grp/0=tl;1=l; To update the default group to add a local event domain 0. # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_control Assignment status after the update: # cat /sys/fs/resctrl/info/L3_MON/mbm_control non_default_ctrl_mon_grp//0=tl;1=tl; non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_; //0=tl;1=tl; /child_default_mon_grp/0=tl;1=l; f. Read the event mbm_total_bytes and mbm_local_bytes of the default group. There is no change in reading the events with ABMC. If the event is unassigned when reading, then the read will come back as "Unassigned". # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes 779247936 # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 765207488 g. Users will have the option to go back to legacy mbm_mode if required. This can be done using the following command. Note that switching the mbm_mode will reset all the mbm counters of all resctrl groups.
mbm -> MBM (throughout)
# echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode # cat /sys/fs/resctrl/info/L3_MON/mbm_mode abmc [legacy] h. Check the bandwidth configuration for the group. Note that bandwidth configuration has a domain scope. Total event defaults to 0x7F (to count all the events) and local event defaults to 0x15 (to count all the local numa events). The event bitmap decoding is available at https://www.kernel.org/doc/Documentation/x86/resctrl.rst in section "mbm_total_bytes_config", "mbm_local_bytes_config": #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 0=0x7f;1=0x7f #cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 0=0x15;1=0x15 j. Change the bandwidth source for domain 0 for the total event to count only reads. Note that this change effects total events on the domain 0. #echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 0=0x33;1=0x7F k. Now read the total event again. The first read will come back with "Unavailable" status. The subsequent read of mbm_total_bytes will display only the read events. #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes Unavailable #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes 314101 l. Unmount the resctrl #umount /sys/fs/resctrl/
Reinette