Hi Reinette, On 2/28/24 14:04, Reinette Chatre wrote: > Hi Babu, > > On 2/28/2024 9:59 AM, Moger, Babu wrote: >> On 2/27/24 17:50, Reinette Chatre wrote: >>> On 2/27/2024 10:12 AM, Moger, Babu wrote: >>>> On 2/26/24 15:20, Reinette Chatre wrote: >>>>> On 2/26/2024 9:59 AM, Moger, Babu wrote: >>>>>> On 2/23/24 16:21, Reinette Chatre wrote: >>> > >>>>> For example, if I understand correctly, theoretically, when ABMC is enabled then >>>>> "num_rmids" can be U32_MAX (after a quick look it is not clear to me why r->num_rmid >>>>> is not unsigned, tbd if number of directories may also be limited by kernfs). >>>>> User space could theoretically create more monitor groups than the number of >>>>> rmids that a resource claims to support using current upstream enumeration. >>>> >>>> CPU or task association still uses PQR_ASSOC(MSR C8Fh). There are only 11 >>>> bits(depends on specific h/w) to represent RMIDs. So, we cannot create >>>> more than this limit(r->num_rmid). >>>> >>>> In case of ABMC, h/w uses another counter(mbm_assignable_counters) with >>>> RMID to assign the monitoring. So, assignment limit is >>>> mbm_assignable_counters. The number of mon groups limit is still r->num_rmid. >>> >>> I see. Thank you for clarifying. This does make enabling simpler and one >>> less user interface item that needs changing. >>> >>> ... >>> >>>>>> 2. /sys/fs/resctrl/monitor_state. >>>>>> This can used to individually assign or unassign the counters in each group. >>>>>> >>>>>> When assigned: >>>>>> #cat /sys/fs/resctrl/monitor_state >>>>>> 0=total-assign,local-assign;1=total-assign,local-assign >>>>>> >>>>>> When unassigned: >>>>>> #cat /sys/fs/resctrl/monitor_state >>>>>> 0=total-unassign,local-unassign;1=total-unassign,local-unassign >>>>>> >>>>>> >>>>>> Thoughts? >>>>> >>>>> How do you expect this interface to be used? I understand the mechanics >>>>> of this interface but on a higher level, do you expect user space to >>>>> once in a while assign a new counter to a single event or monitor group >>>>> (for which a fine grained interface works) or do you expect user space to >>>>> shift multiple counters across several monitor events at intervals? >>>> >>>> I think we should provide both the options. I was thinking of providing >>>> fine grained interface first. >>> >>> Could you please provide a motivation for why two interfaces, one inefficient >>> and one not, should be created and maintained? Users can still do fine grained >>> assignment with a global assignment interface. >> >> Lets consider one by one. >> >> 1. Fine grained assignment. >> >> It will be part of the mongroup(or control mongroup). User has the access >> to the group and can query the group's current status before assigning or >> unassigning. >> >> $cd /sys/fs/resctrl/ctrl_mon1 >> $cat /sys/fs/resctrl/ctrl_mon1/monitor_state >> 0=total-unassign,local-unassign;1=total-unassign,local-unassign; >> >> Assign the total event >> >> $echo 0=total-assign > /sys/fs/resctrl/ctrl_mon1/monitor_state >> >> Assign the local event >> >> $echo 0=local-assign > /sys/fs/resctrl/ctrl_mon1/monitor_state >> >> Assign both events: >> >> $echo 0=total-assign,local-assign > /sys/fs/resctrl/ctrl_mon1/monitor_state >> >> Check the assignment status. >> >> $cat /sys/fs/resctrl/ctrl_mon1/monitor_state >> 0=total-assign,local-assign;1=total-unassign,local-unassign; >> >> -User interface is simple. > > This should not be the only motivation. Please do not sacrifice efficiency > and usability just to have a simple interface. One can also argue that this > interface can only be considered simple from the kernel implementation perspective, > from user space it seems complicated. For example, as James pointed out earlier [1], > user space would need to walk the entire resctrl to find out where counters are > assigned. Peter also pointed out how the multiple syscalls needed when adjusting > hundreds of monitor groups is inefficient. Please take all feedback into account. > > You consider "simple interface" as a motivation, there seems to be at least two > arguments against this interface. Please consider these in your comparison > between interfaces. These are things that should be noted and make their way to > the cover letter. > >> >> -Assignment will fail if all the h/w counters are exhausted. User needs to >> unassign a counter from another group and use that counter here. This can >> be done just querying the monitor state of another group. > > Right ... and as you state there can be hundreds of monitor groups that > user space would need to walk and query to get this information. > >> >> -Monitor group's details(cpus, tasks) are part of the group. So, it is >> better to have assignment state inside the group. > > The assignment state should be clear from the event file. > >> Note: Used interface names here just to give example. >> >> >> 2. global assignment: >> >> I would assume the interface file will be in /sys/fs/resctrl/info/L3_MON/ >> directory. >> >> In case there are 100 mongroups, we need to have a way to list current >> assignment status for these groups. I am not sure how to list status of >> these 100 groups. > > The kernel has many examples of interfaces that manages status of a large > number of entities. I am thinking, for example, we can learn a lot from > how dynamic debug works. On my system I see: > > $ wc -l /sys/kernel/debug/dynamic_debug/control > 5359 /sys/kernel/debug/dynamic_debug/control > >> >> If user is wants to assign the local event(or total) in a specific group >> in this list of 100 groups, I am not sure how to provide interface for >> that. Should we pass the name of mongroup? That will involve looping >> through using the call kernfs_walk_and_get. This may be ok if we are >> dealing with very small number of groups. >> > > What is your concern when needing to modify a large number of groups? > Are you concerned about the size of the writes needing to be parsed? It looks > like kernfs does support writes of larger than PAGE_SIZE, but it is not clear > to me that such large sizes will be required. > > There is also kernfs_find_and_get() that may be more convenient to use. Will look at this. There is also kernfs_name and kernfs_path. > I believe user space needs to provide control group name for a global > interface (the same name can be used by monitor groups belonging to > different control groups), and that can be used to narrow search. > > Reading your message I do not find any motivation _against_ a global > interface, except that it is not obvious to you how such interface may look > or work. That is fair. Peter seems to have ideas and a working implementation > that can be used as reference. So far I have only seen one comment [2] from James > that was skeptical about the global interface but the reason notes that MPAM > allocates counters per domain, which is the same as ABMC so we will need more > information from James here on what is required since he did not respond to > Peter. > > Below is a *hypothetical* interface to start a discussion that explores how > to support fine grained assignment in an interface that aims to be easy to use > by user space. Obviously Peter is also working on something so there > are many viewpoints to consider. > > File info/L3_MON/mbm_assign_control: > #control_group/mon_group/flags > ctrl_a/mon_a/00=_;01=_ > ctrl_a/mon_b/00=l;01=t > ctrl_b/mon_c/00=lt;01=lt I think you left few things here(Like the default control_mon group). To make more clear, let me list all the groups here based this. When none of the counters assigned: $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control resctrl/00=none,none;01=none,none (#default control_mon group) resctrl/mon_a/00=none,none;01=none,none (#mon group) resctrl/ctrl_a/00=none,none;01=none,none (#control_mon group) resctrl/ctrl_a/mon_ab/00=none,none;01=none,none (#mon group) When some counters are assigned: $echo "resctrl/00=total,local" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control (#assigning counter to default group) $echo "resctrl/mon_a/00=total;01=total" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control (#assigning counter to mon group) $echo "resctrl/ctrl_a/00=local;01=local" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control $echo "resctrl/ctrl_a/mon_ab/00=total,local;01=total,local" > /sys/fs/resctrl/nfo/L3_MON/mbm_assign_control $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control resctrl/00=total,local;01=none,none (#default control_mon group) resctrl/mon_a/00=total,none;01=total,none (#mon group) resctrl/ctrl_a/00=none,local;01=none,local (#control_mon group) resctrl/ctrl_a/mon_ab/00=total,local;01=total,local (#mon group) Few comments about this approach: 1.This will involve lots of text processing in the kernel. Will need to figure out calls for these processing. 2.In this approach there is no way to list assignment of a single group(like group resctrl/ctrl_a/mon_ab alone). 3. This is similar to fine grained approach we discussed but in global level. Want to get Pater/James comments about this approach. > > Above file displays to user: > * No counters are assigned to monitor group mon_a within control group ctrl_a > * Counter for local MBM is assigned to domain 0 of monitor group mon_b within > control group ctrl_a > * Counter for total MBM is assigned to domain 1 of monitor group mon_b within > control group ctrl_a > * Counters for local and total MBM are assigned to both domains of monitor > group mon_c within control group ctrl_b > > With above interface user space can, with a single read, get insight into > how counters are assigned across all monitor groups. > User space can write to the file to modify the flags. If assigning a new > counter when no more counters are available then the write will fail. > Potentially, if changes are made in order provided by the user then > the user will be able to unassign counters from one group and re-assign to > another group with a single write. > > I provide this purely to generate some ideas and gather more thoughts on > a global interface. > > Reinette > > [1] https://lore.kernel.org/lkml/2f373abf-f0c0-4f5d-9e22-1039a40a57f0@xxxxxxx/ > [2] https://lore.kernel.org/lkml/1a8c1cd6-a1ce-47a2-bc87-d4cccc84519b@xxxxxxx/ > > > > > -- Thanks Babu Moger