Re: [PATCH v2 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Reinette,

On 2/29/24 15:50, Reinette Chatre wrote:
Hi Babu,

On 2/29/2024 12:37 PM, Moger, Babu wrote:
On 2/28/24 14:04, Reinette Chatre wrote:
On 2/28/2024 9:59 AM, Moger, Babu wrote:
On 2/27/24 17:50, Reinette Chatre wrote:
On 2/27/2024 10:12 AM, Moger, Babu wrote:
On 2/26/24 15:20, Reinette Chatre wrote:
On 2/26/2024 9:59 AM, Moger, Babu wrote:
On 2/23/24 16:21, Reinette Chatre wrote:


For example, if I understand correctly, theoretically, when ABMC is enabled then
"num_rmids" can be U32_MAX (after a quick look it is not clear to me why r->num_rmid
is not unsigned, tbd if number of directories may also be limited by kernfs).
User space could theoretically create more monitor groups than the number of
rmids that a resource claims to support using current upstream enumeration.

CPU or task association still uses PQR_ASSOC(MSR C8Fh). There are only 11
bits(depends on specific h/w) to represent RMIDs. So, we cannot create
more than this limit(r->num_rmid).

In case of ABMC, h/w uses another counter(mbm_assignable_counters) with
RMID to assign the monitoring. So, assignment limit is
mbm_assignable_counters. The number of mon groups limit is still r->num_rmid.

I see. Thank you for clarifying. This does make enabling simpler and one
less user interface item that needs changing.

...

2. /sys/fs/resctrl/monitor_state.
This can used to individually assign or unassign the counters in each group.

When assigned:
#cat /sys/fs/resctrl/monitor_state
0=total-assign,local-assign;1=total-assign,local-assign

When unassigned:
#cat /sys/fs/resctrl/monitor_state
0=total-unassign,local-unassign;1=total-unassign,local-unassign


Thoughts?

How do you expect this interface to be used? I understand the mechanics
of this interface but on a higher level, do you expect user space to
once in a while assign a new counter to a single event or monitor group
(for which a fine grained interface works) or do you expect user space to
shift multiple counters across several monitor events at intervals?

I think we should provide both the options. I was thinking of providing
fine grained interface first.

Could you please provide a motivation for why two interfaces, one inefficient
and one not, should be created and maintained? Users can still do fine grained
assignment with a global assignment interface.

Lets consider one by one.

1. Fine grained assignment.

It will be part of the mongroup(or control mongroup). User has the access
to the group and can query the group's current status before assigning or
unassigning.

   $cd /sys/fs/resctrl/ctrl_mon1
   $cat /sys/fs/resctrl/ctrl_mon1/monitor_state
       0=total-unassign,local-unassign;1=total-unassign,local-unassign;

Assign the total event

  $echo 0=total-assign > /sys/fs/resctrl/ctrl_mon1/monitor_state

Assign the local event

   $echo 0=local-assign > /sys/fs/resctrl/ctrl_mon1/monitor_state

Assign both events:

   $echo 0=total-assign,local-assign > /sys/fs/resctrl/ctrl_mon1/monitor_state

Check the assignment status.

   $cat /sys/fs/resctrl/ctrl_mon1/monitor_state
       0=total-assign,local-assign;1=total-unassign,local-unassign;

-User interface is simple.

This should not be the only motivation. Please do not sacrifice efficiency
and usability just to have a simple interface. One can also argue that this
interface can only be considered simple from the kernel implementation perspective,
from user space it seems complicated. For example, as James pointed out earlier [1],
user space would need to walk the entire resctrl to find out where counters are
assigned. Peter also pointed out how the multiple syscalls needed when adjusting
hundreds of monitor groups is inefficient. Please take all feedback into account.

You consider "simple interface" as a motivation, there seems to be at least two
arguments against this interface. Please consider these in your comparison
between interfaces. These are things that should be noted and make their way to
the cover letter.


-Assignment will fail if all the h/w counters are exhausted. User needs to
unassign a counter from another group and use that counter here. This can
be done just querying the monitor state of another group.

Right ... and as you state there can be hundreds of monitor groups that
user space would need to walk and query to get this information.


-Monitor group's details(cpus, tasks) are part of the group. So, it is
better to have assignment state inside the group.

The assignment state should be clear from the event file.

Note: Used interface names here just to give example.


2. global assignment:

I would assume the interface file will be in /sys/fs/resctrl/info/L3_MON/
directory.

In case there are 100 mongroups, we need to have a way to list current
assignment status for these groups. I am not sure how to list status of
these 100 groups.

The kernel has many examples of interfaces that manages status of a large
number of entities. I am thinking, for example, we can learn a lot from
how dynamic debug works. On my system I see:

$ wc -l /sys/kernel/debug/dynamic_debug/control
5359 /sys/kernel/debug/dynamic_debug/control


If user is wants to assign the local event(or total) in a specific group
in this list of 100 groups, I am not sure how to provide interface for
that. Should we pass the name of mongroup? That will involve looping
through using the call kernfs_walk_and_get. This may be ok if we are
dealing with very small number of groups.


What is your concern when needing to modify a large number of groups?
Are you concerned about the size of the writes needing to be parsed? It looks
like kernfs does support writes of larger than PAGE_SIZE, but it is not clear
to me that such large sizes will be required.
There is also kernfs_find_and_get() that may be more convenient to use.

Will look at this. There is also kernfs_name and kernfs_path.

I believe user space needs to provide control group name for a global
interface (the same name can be used by monitor groups belonging to
different control groups), and that can be used to narrow search.

Reading your message I do not find any motivation _against_ a global
interface, except that it is not obvious to you how such interface may look
or work. That is fair. Peter seems to have ideas and a working implementation
that can be used as reference. So far I have only seen one comment [2] from James
that was skeptical about the global interface but the reason notes that MPAM
allocates counters per domain, which is the same as ABMC so we will need more
information from James here on what is required since he did not respond to
Peter.

Below is a *hypothetical* interface to start a discussion that explores how
to support fine grained assignment in an interface that aims to be easy to use
by user space. Obviously Peter is also working on something so there
are many viewpoints to consider.

File info/L3_MON/mbm_assign_control:
#control_group/mon_group/flags
ctrl_a/mon_a/00=_;01=_
ctrl_a/mon_b/00=l;01=t
ctrl_b/mon_c/00=lt;01=lt

I think you left few things here(Like the default control_mon group).

No. Similar to proc_resctrl_show() the fields can be empty for
the default group or mon groups belonging to control group.

ok. Need to understand this better. Hope I learn while doing this work.



To make more clear, let me list all the groups here based this.

When none of the counters assigned:

$cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
resctrl/00=none,none;01=none,none (#default control_mon group)
resctrl/mon_a/00=none,none;01=none,none (#mon group)
resctrl/ctrl_a/00=none,none;01=none,none (#control_mon group)
resctrl/ctrl_a/mon_ab/00=none,none;01=none,none (#mon group)

I am concerned that inconsistent use of "/" will make parsing hard.

Do you mean, you don't want to see multiple "/"?

resctrl/ctrl_a/mon_ab/

Change to

mon_ab/


I find "resctrl" and all the "none" redundant. It is not clear what
this improves.
Why have:
resctrl/00=none,none;01=none,none
when this could do:
//00=_;01=_

ok.

"//" meaning root of resctrl filesystem?




When some counters are assigned:

$echo "resctrl/00=total,local" >
/sys/fs/resctrl/info/L3_MON/mbm_assign_control (#assigning counter to
default group)

$echo "resctrl/mon_a/00=total;01=total" >
/sys/fs/resctrl/info/L3_MON/mbm_assign_control (#assigning counter to mon
group)

$echo "resctrl/ctrl_a/00=local;01=local" >
/sys/fs/resctrl/info/L3_MON/mbm_assign_control

$echo "resctrl/ctrl_a/mon_ab/00=total,local;01=total,local" >
/sys/fs/resctrl/nfo/L3_MON/mbm_assign_control


We could learn some more lessons from dynamic debug (see Documentation/admin-guide/dynamic-debug-howto.rst). For example, "=" can be used to make an assignment while "+"
can be used to add a counter and "-" can be used to remove a counter.
"=_" can be used to remove counters from all events in that domain.

Yes. Looked at dynamic debug. I am still learning this interface. Some examples below based on my understanding.

To assign a counters to default group on domain 0.
$echo "//00=+lt;01=+lt" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

To assign a counters to mon group inside the default group.
$echo "mon_a/00=+t;01=+t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

To assign a counters to control mon group inside the default group.
$echo "ctrl_a/00=+l;01=+l"  > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

To assign a counters to control mon group inside another control group.
$echo "mon_ab/00=+lt;01=+lt" > /sys/fs/resctrl/nfo/L3_MON/mbm_assign_contro

To unassign a counters to control mon group inside another control group.
$echo "mon_ab/00=-lt;01=-lt" > /sys/fs/resctrl/nfo/L3_MON/mbm_assign_control

To unassign all the counters on a specific group.
$echo "mon_ab/00=_" > /sys/fs/resctrl/nfo/L3_MON/mbm_assign_control

It does not matter control group or mon group. We just need to name of the group in this interface.

Listing will be

$cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
//00=lt;01=lt
/mon_a/00=t;01=t
/ctrl_a/00=l;01=l
/mon_ab/00=_;01=_


The interface should also support assign/un-assign to multiple groups with
a single write. To start this could use '\n' as separator as is the custom
with other resctrl interfaces.

Yes. that should be fine.


$cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
resctrl/00=total,local;01=none,none (#default control_mon group)
resctrl/mon_a/00=total,none;01=total,none (#mon group)
resctrl/ctrl_a/00=none,local;01=none,local (#control_mon group)
resctrl/ctrl_a/mon_ab/00=total,local;01=total,local (#mon group)


Few comments about this approach:
1.This will involve lots of text processing in the kernel. Will need to
figure out calls for these processing.

I see that additional parsing will be needed to determine control group
and monitor group. For these it sounds like you already have a few options
for kernfs API to use.
Apart from that the counter assignment will be similar parsing as what
was done in your previous versions. I think parsing will be easier if it
does not try to use words for the events but just use one letter flags.
For example, there is thus no need to look for "," in the parsing of the
events, just parse one character at a time where each character has a
specific meaning.

ok.



2.In this approach there is no way to list assignment of a single
group(like group resctrl/ctrl_a/mon_ab alone).

Should the kernel be responsible for enabling this? User space can just
do a "cat mbm_assign_control | grep mon_ab". Is this not sufficient?

That may be ok. Peter, Please comment on this.



3. This is similar to fine grained approach we discussed but in global level.

That is what I have been trying to get across. This has full benefit of the
original implementation while also addressing all problems raised against it.


Want to get Pater/James comments about this approach.
(Peter)

Of course. I look forward to that. Once agreed it may also be worthwhile to
approach x86 maintainers with an RFC of the proposed new user interface to learn
their guidance. This is where it is important to keep track of all the requirements,
as well as pros and cons of different options.

Ok. Sure. I am fine making next version as RFC.


Reinette

--
Thanks
Babu Moger




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux