Hi Reinette, On 7/12/24 17:03, Reinette Chatre wrote: > Hi Babu, > > On 7/3/24 2:48 PM, Babu Moger wrote: >> # Linux Implementation >> >> Linux resctrl subsystem provides the interface to count maximum of two >> memory bandwidth events per group, from a combination of available total >> and local events. Keeping the current interface, users can enable a maximum >> of 2 ABMC counters per group. User will also have the option to enable only >> one counter to the group. If the system runs out of assignable ABMC >> counters, kernel will display an error. Users need to disable an already >> enabled counter to make space for new assignments. > > The implementation appears to be converging on an interface that can > be generic enough to be used by other features discussed along the way. > "Linux implementation" summary can thus add: > > Create a generic interface aimed to support user space assignment > of scarce counters used for monitoring. First usage of interface > is by ABMC with option to expand usage to "soft-RMID" and MPAM > counters in future. Sure. > > >> # Examples >> >> a. Check if ABMC support is available >> #mount -t resctrl resctrl /sys/fs/resctrl/ >> >> #cat /sys/fs/resctrl/info/L3_MON/mbm_mode >> [abmc] >> legacy >> >> Linux kernel detected ABMC feature and it is enabled. > > How about renaming "abmc" to "mbm_cntrs"? This will match the num_mbm_cntrs > info file and be the final step to make this generic so that another > architecture > can more easily support assignining hardware counters without needing to call > the feature AMD's "abmc". I think we aleady settled this with "mbm_cntr_assignable". For soft-RMID" it will be mbm_sw_assignable. > > Expanding on this it may be possible to add a new "sw_mbm_cntrs" feature that > will be the "soft-RMID" feature while also reflecting the "mbm_cntrs" name > so that when user space enables that feature its properties can be found in > "num_mbm_cntrs". > > The "abmc" kernel parameter remains but that does seem separate from this > resctrl fs feature since it is explicitly tied to X86_FEATURE_ABMC surely > making it architecture specific. > >> >> b. Check how many ABMC counters are available. >> >> #cat /sys/fs/resctrl/info/L3_MON/num_cntrs >> 32 > > This is now num_mbm_cntrs Sure. > >> >> c. Create few resctrl groups. >> >> # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp >> # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp >> # mkdir >> /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp >> >> >> d. This series adds a new interface file >> /sys/fs/resctrl/info/L3_MON/mbm_control >> to list and modify the group's monitoring states. File provides >> single place >> to list monitoring states of all the resctrl groups. It makes it >> easier for >> user space to learn about the counters are used without needing to >> traverse >> all the groups thus reducing the number of filesystem calls. >> >> The list follows the following format: >> >> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>" >> >> Format for specific type of groups: >> >> * Default CTRL_MON group: >> "//<domain_id>=<flags>" >> >> * Non-default CTRL_MON group: >> "<CTRL_MON group>//<domain_id>=<flags>" >> >> * Child MON group of default CTRL_MON group: >> "/<MON group>/<domain_id>=<flags>" >> >> * Child MON group of non-default CTRL_MON group: >> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>" >> >> Flags can be one of the following: >> >> t MBM total event is enabled. >> l MBM local event is enabled. >> tl Both total and local MBM events are enabled. >> _ None of the MBM events are enabled > > The language needs to be changed here (and in the many copied places) to > be specific about what setting the flag accomplishes. For example, in > "legacy" mode user space can be expected to find all events enabled, no? > Needing a new feature to set a flag to accomplish something that is > possible in legacy mode can thus cause confusion. Yes. It is possible to do it. But I feel unnessassary. > > If I understand the implementation reading "mbm_control" will fail > if system is ABMC capable but it is disabled. Why can "mbm_control" not > always be displayed to user space? For example, what if "mbm_control" is > always available to user space and it can provide specific information to > user space. For example: > t MBM total event is enabled but may not always be counted. > T MBM total event is enabled and being counted. > > On AMD systems resource groups will have "t" associated with monitor > groups when ABMC disabled, "T" when ABMC enabled and a counter assigned. > On Intel systems monitor groups will always have "T". I think more flags will add more confusion. > > For "soft-RMID" the flag could possible continue to be "T"? > > I am trying to find ways to communicate to user space consistently > and clearly and any insights will be appreciated. We really do not want > to add this interface and then find that it just causes confusion. > > It is not quite obvious to me when the new files should be visible and > what they should present to the user. "mbm_mode" is now always visible. > Should "num_mbm_cntrs" not also always be visible? Right now "num_mbm_cntrs" > appears to be only associated to ABMC, should it not also, for example, > be the file that "soft-RMID" may use to share how many counters are > available? Its contents will thus be dynamic based on which "MBM mode" is > active, begging the question, what should it contain when "legacy" mode is > enabled, should "num_mbm_cntrs" perhaps show "0" to user space when > "legacy" mode is active? Its good we have this discussion. How about we go with simple way for now. The mbm_mode will only available when ABMC or Soft_RMID(MPAM feature) is supported. Same way for the num_mbm_cntrs. > >> >> Examples: >> >> # cat /sys/fs/resctrl/info/L3_MON/mbm_control >> non_default_ctrl_mon_grp//0=tl;1=tl; >> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; >> //0=tl;1=tl; >> /child_default_mon_grp/0=tl;1=tl; >> >> There are four groups and all the groups have local and total >> event enabled on domain 0 and 1. > > "local and total event" is vague, can it be made specific with, for example, > "local and total MBM events" Sure. > >> >> =tl means both total and local events are enabled. > > Same here (and all copied places in this series) Sure. > >> >> "//" - This is a default CTRL_MON group >> >> "non_default_ctrl_mon_grp//" - This is non-default CTRL_MON group >> >> "/child_default_mon_grp/" - This is Child MON group of the defult >> group > > Same typos as in previous version of cover letter. Oh. no. Will fix it. > >> >> "non_default_ctrl_mon_grp/child_non_default_mon_grp/" - This is child >> MON group of the non-default group >> >> e. Update the group assignment states using the interface file >> /sys/fs/resctrl/info/L3_MON/mbm_control. >> >> The write format is similar to the above list format with addition of >> op-code for the assignment operation. >> >> * Default CTRL_MON group: >> "//<domain_id><op-code><flags>" >> >> * Non-default CTRL_MON group: >> "<CTRL_MON group>//<domain_id><op-code><flags>" >> >> * Child MON group of default CTRL_MON group: >> "/<MON group>/<domain_id><op-code><flags>" >> >> * Child MON group of non-default CTRL_MON group: >> "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>" >> >> Op-code can be one of the following: >> >> = Update the assignment to match the flag. >> + Assign a new state. >> - Unassign a new state. > > Please be consistent with terminology. Above switches between "flag" > and "state" while it then continues below using "event". Also, > "Unassign a _new_ state" is unexpected, it should probably be an > _existing_ (not "new") state/flag/event? I will use event consistantly. > >> >> Flags can be one of the following: >> >> t MBM total event. >> l MBM local event. >> tl Both total and local MBM events. >> _ None of the MBM events. Only works with '=' op-code. >> >> Initial group status: >> # cat /sys/fs/resctrl/info/L3_MON/mbm_control >> non_default_ctrl_mon_grp//0=tl;1=tl; >> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; >> //0=tl;1=tl; >> /child_default_mon_grp/0=tl;1=tl; >> >> To update the default group to enable only total event on domain 0: >> # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_control >> >> Assignment status after the update: >> # cat /sys/fs/resctrl/info/L3_MON/mbm_control >> non_default_ctrl_mon_grp//0=tl;1=tl; >> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; >> //0=t;1=tl; >> /child_default_mon_grp/0=tl;1=tl; >> >> To update the MON group child_default_mon_grp to remove total event >> on domain 1: >> # echo "/child_default_mon_grp/1-t" > >> /sys/fs/resctrl/info/L3_MON/mbm_control >> >> Assignment status after the update: >> $ cat /sys/fs/resctrl/info/L3_MON/mbm_control >> non_default_ctrl_mon_grp//0=tl;1=tl; >> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; >> //0=t;1=tl; >> /child_default_mon_grp/0=tl;1=l; >> >> To update the MON group >> non_default_ctrl_mon_grp/child_non_default_mon_grp to >> remove both local and total events on domain 1: >> # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" > >> /sys/fs/resctrl/info/L3_MON/mbm_control >> >> Assignment status after the update: >> non_default_ctrl_mon_grp//0=tl;1=tl; >> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_; >> //0=t;1=tl; >> /child_default_mon_grp/0=tl;1=l; >> >> To update the default group to add a local event domain 0. >> # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_control >> >> Assignment status after the update: >> # cat /sys/fs/resctrl/info/L3_MON/mbm_control >> non_default_ctrl_mon_grp//0=tl;1=tl; >> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_; >> //0=tl;1=tl; >> /child_default_mon_grp/0=tl;1=l; >> >> >> f. Read the event mbm_total_bytes and mbm_local_bytes of the default group. >> There is no change in reading the events with ABMC. If the event is >> unassigned >> when reading, then the read will come back as "Unassigned". >> >> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes >> 779247936 >> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes >> 765207488 >> >> g. Users will have the option to go back to legacy mbm_mode if required. >> This can be done using the following command. Note that switching the >> mbm_mode will reset all the mbm counters of all resctrl groups. > > mbm -> MBM (throughout) Sure. > >> >> # echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode >> # cat /sys/fs/resctrl/info/L3_MON/mbm_mode >> abmc >> [legacy] >> >> h. Check the bandwidth configuration for the group. Note that bandwidth >> configuration has a domain scope. Total event defaults to 0x7F (to >> count all the events) and local event defaults to 0x15 (to count all >> the local numa events). The event bitmap decoding is available at >> https://www.kernel.org/doc/Documentation/x86/resctrl.rst >> in section "mbm_total_bytes_config", "mbm_local_bytes_config": >> >> #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config >> 0=0x7f;1=0x7f >> >> #cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config >> 0=0x15;1=0x15 >> >> j. Change the bandwidth source for domain 0 for the total event to count >> only reads. >> Note that this change effects total events on the domain 0. >> >> #echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config >> #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config >> 0=0x33;1=0x7F >> >> k. Now read the total event again. The first read will come back with >> "Unavailable" >> status. The subsequent read of mbm_total_bytes will display only the >> read events. >> >> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes >> Unavailable >> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes >> 314101 >> >> l. Unmount the resctrl >> >> #umount /sys/fs/resctrl/ >> > > Reinette > -- Thanks Babu Moger