Hi Reinette, I am trying to propose few things here to move forward based on my assumptions. Please point me if I missed something. On 2/5/24 16:38, Reinette Chatre wrote: > Hi Babu, > > On 2/2/2024 1:57 PM, Moger, Babu wrote: >> On 2/1/2024 10:09 PM, Reinette Chatre wrote: >>> On 1/19/2024 10:22 AM, Babu Moger wrote: >>>> These series adds the support for Assignable Bandwidth Monitoring Counters >>> Not a good start ([1]). >> >> Yea. My bad. >> >>> >>>> (ABMC). It is also called QoS RMID Pinning feature >>>> >>>> The feature details are documented in the APM listed below [1]. >>>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming >>>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth >>>> Monitoring (ABMC). The documentation is available at >>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 >>>> >>>> The patches are based on top of commit >>>> 1ac6b49423e83af2abed9be7fbdf2e491686c66b (tip/master) >>>> >>>> # Introduction >>>> >>>> AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring >>>> feature only guarantees that RMIDs currently assigned to a processor will >>>> be tracked by hardware. The counters of any other RMIDs which are no longer >>>> being tracked will be reset to zero. The MBM event counters return >>>> "Unavailable" for the RMIDs that are not active. >>>> Users can create 256 or more monitor groups. But there can be only limited >>>> number of groups that can be give guaranteed monitoring numbers. With ever >>> "can be given"? >> >> "can give guaranteed monitoring numbers." >> >> I feel this looks better. > > Sounds good. Thank you. > >> >>> >>>> changing configurations there is no way to definitely know which of these >>>> groups will be active for certain point of time. Users do not have the >>>> option to monitor a group or set of groups for certain period of time >>>> without worrying about RMID being reset in between. >>>> The ABMC feature provides an option to the user to assign an RMID to the >>>> hardware counter and monitor the bandwidth for a longer duration. >>>> The assigned RMID will be active until the user unassigns it manually. >>>> There is no need to worry about counters being reset during this period. >>>> Additionally, the user can specify a bitmask identifying the specific >>>> bandwidth types from the given source to track with the counter. >>>> >>>> Without ABMC enabled, monitoring will work in current mode without >>>> assignment option. >>>> >>>> # Linux Implementation >>>> >>>> Linux resctrl subsystem provides the interface to count maximum of two >>>> memory bandwidth events per group, from a combination of available total >>>> and local events. Keeping the current interface, users can assign a maximum >>>> of 2 ABMC counters per group. User will also have the option to assign only >>>> one counter to the group. If the system runs out of assignable ABMC >>>> counters, kernel will display an error. Users need to unassign an already >>>> assigned counter to make space for new assignments. >>>> >>>> >>>> # Examples >>>> >>>> a. Check if ABMC support is available >>>> #mount -t resctrl resctrl /sys/fs/resctrl/ >>>> >>>> #cat /sys/fs/resctrl/info/L3_MON/mon_features >>>> llc_occupancy >>>> mbm_total_bytes >>>> mbm_total_bytes_config >>>> mbm_local_bytes >>>> mbm_local_bytes_config >>>> mbm_assign_capable ← Linux kernel detected ABMC feature >>>> >>>> b. Check if ABMC is enabled. By default, ABMC feature is disabled. >>>> Monitoring works in legacy monitor mode when ABMC is not enabled. >>>> >>>> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_enable >>>> 0 >>>> >>> With the introduction of "mbm_assign_enable" the entry in mon_features seems >>> to provide duplicate information. >> >> ok. We can remove the text in mon_features and keep mbm_assign_enable. We need this to enable and disable the feature. > > This could be improved beyond a binary "enable"/"disable" interface to user space. > For example, the hardware can discover which "mbm counter assign" related feature > (I'm counting the "soft RMID" here as one of the "mbm counter assign" related > features) is supported on the platform and it can be presented to the user like: > > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign > [feature_1] feature_2 feature_3 How about this? # cat /sys/fs/resctrl/info/L3_MON/mbm_assign ABMC:Capable SOFT-RMID:Capable To enable ABMC # echo ABMC:enable > /sys/fs/resctrl/info/L3_MON/mbm_assign When ABMC is enabled: # cat /sys/fs/resctrl/info/L3_MON/mbm_assign ABMC:Enable SOFT-RMID:Capable > The output indicates which features are supported by the platform and the brackets indicate > which feature is enabled. > > >>>> c. There will be new file "monitor_state" for each monitor group when ABMC >>>> feature is supported. However, monitor_state is not available if ABMC is >>>> disabled. >>>> >>>> #cat /sys/fs/resctrl/monitor_state >>>> Unsupported >>> This sounds potentially confusing since users will still be able to monitor >>> the groups ... >> How about "Assignment-Unsupported"? > > (please see later) > >>> >>>> >>>> d. Read the event mbm_total_bytes and mbm_local_bytes. Without ABMC >>>> enabled, monitoring will work in current mode without assignment option. >>>> >>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes >>>> 779247936 >>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes >>>> 765207488 >>>> >>>> e. Enable ABMC mode. >>>> >>>> #echo 1 > /sys/fs/resctrl/info/L3_MON/mbm_assign_enable >>>> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_enable >>>> 1 >>>> >>>> f. Read the monitor states. By default, both total and local MBM >>>> events are in "unassign" state. >>>> >>>> #cat /sys/fs/resctrl/monitor_state >>>> total=unassign;local=unassign >>> This interface does not seem to take into account that hardware >>> can support assignment per domain. I understand that this is >>> not something you want to implement at this time but the user interface >>> has to accommodate such an enhancement. This was already mentioned, and >>> you did acknowledge the point [3] to this new version that does not >>> reflect this is unexpected. >> >> Yea. Domain level assignment is not supported at this point. Do you want me to explicitly mention here? >> >> Please elaborate what you meant here. > > You have made it clear on several occasions that you do not intend to support > domain level assignment. That may be ok but the interface you create should > not prevent future support of domain level assignment. > > If my point is not clear, could you please share how this interface is able to > support domain level assignment in the future? > > I am starting to think that we need a file similar to the schemata file > for group and domain level monitor configurations. Something like this? By default #cat /sys/fs/resctrl/monitor_state default:0=total=assign,local=assign;1=total=assign,local=assign With ABMC, #cat /sys/fs/resctrl/monitor_state ABMC:0=total=unassign,local=unassign;1=total=unassign,local=unassign > >>> My previous suggestions do seem to still stand and and I also am not able to >>> see how Peter's requests [2] were considered. This same interface needs to >>> accommodate usages apart from ABMC. For example, how to use this interface >>> to address the same counter issue on AMD hardware without ABMC, and MPAM >>> (pending James's feedback). >> >> Yea. Agree. Peter's comments are not addressed. I am not all clear >> about details of Peters and James requirement. > > Peter listed his requirements in [1]. That email thread is a worthwhile read > for the use cases. > > I believe that James is aware of this work and do hope to hear from him. > >> >> With respect to ABMC here are my requirements. >> >> a. Assignment needs to be done at group level. >> >> b. User should be able to assign each event individually. Multiple events assignment(in one command) should be supported. >> >> c. I have no plans to implement domain level assignment. It is done at system level. >> >> d. We need only couple of states. Assigned and unassigned. >> >> e. monitor_state is name of file for user interface. We can change that based on comments. >> >> Peter, James, >> >> Please comment on what you want achieve in "assignment" based on the features you are working on. >> >> Do you want to add new states? >> >>> >>> I understand that until we hear from Arm we do not know all the requirements >>> that this interface needs to support, but I do expect this interface to >>> at least consider requirements and usage scenarios that are already known. >> >> Sure. Will try that in the next version. Lets continue the discussion. >> >> >>>> g. Read the event mbm_total_bytes and mbm_local_bytes. In ABMC mode, >>>> the MBA events are not available until the user assigns the events >>>> explicitly. >>>> >>>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes >>>> Unsupported >>>> >>>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes >>>> Unsupported >>>> >>> This needs some more thought to accommodate Peter's scenario where the counter >>> can be expected to return the final count after the counter is disabled. >> >> I am not sure how to achieve this with ABMC. This may be applicable >> to soft rmid only. In case of "soft rmid", previous readings are >> saved in the soft rmid state. > > Right. Please consider this work in two parts, first, there is a generic > interface that aims to support ABMC, "soft RMID", and MPAM. Second, there > is using this interface to support ABMC. Yea. But it is tough without knowing all the details of the other features. How about? #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes ABMC:Unassigned #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes ABMC:Unassigned > >>>> h. The event llc_occupancy is not affected by ABMC mode. Users can still >>>> read the llc_occupancy. >>>> >>>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/llc_occupancy >>>> 557056 >>>> >>>> i. Now assign the total event and read the monitor_state. >>>> >>>> #echo total=assign > /sys/fs/resctrl/monitor_state >>>> #cat /sys/fs/resctrl/monitor_state >>>> total=assign;local=unassign >>>> >>> I do not see the "global assign/unassign" scenario addressed. >> >> I am not all clear on meaning of "global assign/unassign". Does it >> mean looping thru all the groups and assign the RMIDs? > > Please see [1]. > > >> It may not work in many cases. In case of ABMC, we have only limited >> number of hw counters. It will fail after hardware runs out of >> counters. It is better done selectively based on which group user is >> interested in. > > Right. This is one more item where the generic interface needs to > accommodate different hardware implementations. Perhaps this could > be one of the "features" exposed by (global) mbm_assign that the > user can "enable"/"disable" on demand? > >> But it can be done later if we find a use case for that. > > There already exists a use case as presented by Peter in support > of AMD hardware without ABMC, no? Yes. There is a use case. But seems like the use case is mostly applicable to soft-rmid feature. We can tie the global assign only to soft-rmid. # echo SOFT-RMID:enable > /sys/fs/resctrl/info/L3_MON/mbm_assign Because this is soft-rmid, call global assign method. # echo ABMC:enable > /sys/fs/resctrl/info/L3_MON/mbm_assign Because this is ABMC, do the steps required just to enable ABMC. Don't do individual assignment > >>> This version seems to ignore (without discussion) a lot of earlier >>> feedback. >> >> Please feel free comment. There are various threads of discussion. I may have missed. >> > > > Reinette > > [1] https://lore.kernel.org/lkml/CALPaoCiRD6j_Rp7ffew+PtGTF4rWDORwbuRQqH2i-cY5SvWQBg@xxxxxxxxxxxxxx/ -- Thanks Babu Moger