Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
# Linux Implementation

Linux resctrl subsystem provides the interface to count maximum of two
memory bandwidth events per group, from a combination of available total
and local events. Keeping the current interface, users can enable a maximum
of 2 ABMC counters per group. User will also have the option to enable only
one counter to the group. If the system runs out of assignable ABMC
counters, kernel will display an error. Users need to disable an already
enabled counter to make space for new assignments.

The implementation appears to be converging on an interface that can
be generic enough to be used by other features discussed along the way.
"Linux implementation" summary can thus add:

	Create a generic interface aimed to support user space assignment
	of scarce counters used for monitoring. First usage of interface
	is by ABMC with option to expand usage to "soft-RMID" and MPAM
	counters in future.


# Examples

a. Check if ABMC support is available
	#mount -t resctrl resctrl /sys/fs/resctrl/

	#cat /sys/fs/resctrl/info/L3_MON/mbm_mode
	[abmc]
	legacy

	Linux kernel detected ABMC feature and it is enabled.

How about renaming "abmc" to "mbm_cntrs"? This will match the num_mbm_cntrs
info file and be the final step to make this generic so that another architecture
can more easily support assignining hardware counters without needing to call
the feature AMD's "abmc".

Expanding on this it may be possible to add a new "sw_mbm_cntrs" feature that
will be the "soft-RMID" feature while also reflecting the "mbm_cntrs" name
so that when user space enables that feature its properties can be found in
"num_mbm_cntrs".

The "abmc" kernel parameter remains but that does seem separate from this
resctrl fs feature since it is explicitly tied to X86_FEATURE_ABMC surely
making it architecture specific.


b. Check how many ABMC counters are available.

	#cat /sys/fs/resctrl/info/L3_MON/num_cntrs
	32

This is now num_mbm_cntrs


c. Create few resctrl groups.

	# mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp


d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_control
    to list and modify the group's monitoring states. File provides single place
    to list monitoring states of all the resctrl groups. It makes it easier for
    user space to learn about the counters are used without needing to traverse
    all the groups thus reducing the number of filesystem calls.

	The list follows the following format:

	"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"

	Format for specific type of groups:

	* Default CTRL_MON group:
	 "//<domain_id>=<flags>"

        * Non-default CTRL_MON group:
                "<CTRL_MON group>//<domain_id>=<flags>"

        * Child MON group of default CTRL_MON group:
                "/<MON group>/<domain_id>=<flags>"

        * Child MON group of non-default CTRL_MON group:
                "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"

        Flags can be one of the following:

         t  MBM total event is enabled.
         l  MBM local event is enabled.
         tl Both total and local MBM events are enabled.
         _  None of the MBM events are enabled

The language needs to be changed here (and in the many copied places) to
be specific about what setting the flag accomplishes. For example, in
"legacy" mode user space can be expected to find all events enabled, no?
Needing a new feature to set a flag to accomplish something that is
possible in legacy mode can thus cause confusion.

If I understand the implementation reading "mbm_control" will fail
if system is ABMC capable but it is disabled. Why can "mbm_control" not
always be displayed to user space? For example, what if "mbm_control" is
always available to user space and it can provide specific information to
user space. For example:
	t  MBM total event is enabled but may not always be counted.
	T  MBM total event is enabled and being counted.

On AMD systems resource groups will have "t" associated with monitor
groups when ABMC disabled, "T" when ABMC enabled and a counter assigned.
On Intel systems monitor groups will always have "T".

For "soft-RMID" the flag could possible continue to be "T"?

I am trying to find ways to communicate to user space consistently
and clearly and any insights will be appreciated. We really do not want
to add this interface and then find that it just causes confusion.

It is not quite obvious to me when the new files should be visible and
what they should present to the user. "mbm_mode" is now always visible.
Should "num_mbm_cntrs" not also always be visible? Right now "num_mbm_cntrs"
appears to be only associated to ABMC, should it not also, for example,
be the file that "soft-RMID" may use to share how many counters are
available? Its contents will thus be dynamic based on which "MBM mode" is
active, begging the question, what should it contain when "legacy" mode is
enabled, should "num_mbm_cntrs" perhaps show "0" to user space when
"legacy" mode is active?



	Examples:

	# cat /sys/fs/resctrl/info/L3_MON/mbm_control
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
	//0=tl;1=tl;
	/child_default_mon_grp/0=tl;1=tl;
	
	There are four groups and all the groups have local and total
	event enabled on domain 0 and 1.

"local and total event" is vague, can it be made specific with, for example,
"local and total MBM events"


	=tl means both total and local events are enabled.

Same here (and all copied places in this series)


	"//" - This is a default CTRL_MON group

	"non_default_ctrl_mon_grp//" - This is non-default CTRL_MON group

	"/child_default_mon_grp/"  - This is Child MON group of the defult group

Same typos as in previous version of cover letter.


	"non_default_ctrl_mon_grp/child_non_default_mon_grp/" - This is child
	MON group of the non-default group

e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_control.

	The write format is similar to the above list format with addition of
	op-code for the assignment operation.
	
	* Default CTRL_MON group:
	        "//<domain_id><op-code><flags>"
	
	* Non-default CTRL_MON group:
	        "<CTRL_MON group>//<domain_id><op-code><flags>"
	
	* Child MON group of default CTRL_MON group:
	        "/<MON group>/<domain_id><op-code><flags>"
	
	* Child MON group of non-default CTRL_MON group:
	        "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"
	
	Op-code can be one of the following:
	
	= Update the assignment to match the flag.
	+ Assign a new state.
	- Unassign a new state.

Please be consistent with terminology. Above switches between "flag"
and "state" while it then continues below using "event". Also,
"Unassign a _new_ state" is unexpected, it should probably be an
_existing_ (not "new") state/flag/event?


	Flags can be one of the following:

         t  MBM total event.
         l  MBM local event.
         tl Both total and local MBM events.
         _  None of the MBM events. Only works with '=' op-code.
	
	Initial group status:
	# cat /sys/fs/resctrl/info/L3_MON/mbm_control
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
	//0=tl;1=tl;
	/child_default_mon_grp/0=tl;1=tl;

	To update the default group to enable only total event on domain 0:
	# echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_control

	Assignment status after the update:
	# cat /sys/fs/resctrl/info/L3_MON/mbm_control
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
	//0=t;1=tl;
	/child_default_mon_grp/0=tl;1=tl;

	To update the MON group child_default_mon_grp to remove total event on domain 1:
	# echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control

	Assignment status after the update:
	$ cat /sys/fs/resctrl/info/L3_MON/mbm_control
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
	//0=t;1=tl;
	/child_default_mon_grp/0=tl;1=l;

	To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
	remove both local and total events on domain 1:
	# echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
	       /sys/fs/resctrl/info/L3_MON/mbm_control

	Assignment status after the update:
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
	//0=t;1=tl;
	/child_default_mon_grp/0=tl;1=l;

	To update the default group to add a local event domain 0.
	# echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_control

	Assignment status after the update:
	# cat /sys/fs/resctrl/info/L3_MON/mbm_control
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
	//0=tl;1=tl;
	/child_default_mon_grp/0=tl;1=l;


f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
    There is no change in reading the events with ABMC. If the event is unassigned
    when reading, then the read will come back as "Unassigned".
	
	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	779247936
	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
	765207488
	
g. Users will have the option to go back to legacy mbm_mode if required.
    This can be done using the following command. Note that switching the
    mbm_mode will reset all the mbm counters of all resctrl groups.

mbm -> MBM (throughout)


	# echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
	# cat /sys/fs/resctrl/info/L3_MON/mbm_mode
	abmc
	[legacy]

h. Check the bandwidth configuration for the group. Note that bandwidth
    configuration has a domain scope. Total event defaults to 0x7F (to
    count all the events) and local event defaults to 0x15 (to count all
    the local numa events). The event bitmap decoding is available at
    https://www.kernel.org/doc/Documentation/x86/resctrl.rst
    in section "mbm_total_bytes_config", "mbm_local_bytes_config":
	
	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
	0=0x7f;1=0x7f
	
	#cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
	0=0x15;1=0x15
	
j. Change the bandwidth source for domain 0 for the total event to count only reads.
    Note that this change effects total events on the domain 0.
	
	#echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
	0=0x33;1=0x7F
	
k. Now read the total event again. The first read will come back with "Unavailable"
    status. The subsequent read of mbm_total_bytes will display only the read events.
	
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	Unavailable
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	314101
	
l. Unmount the resctrl
	
	#umount /sys/fs/resctrl/


Reinette




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux