Hi Babu, On 11/18/24 11:04 AM, Moger, Babu wrote: > Hi Reinette, > > On 11/15/24 18:00, Reinette Chatre wrote: >> Hi Babu, >> >> On 10/29/24 4:21 PM, Babu Moger wrote: >>> Introduce the interface file "mbm_assign_mode" to list monitor modes >>> supported. >>> >>> The "mbm_cntr_assign" mode provides the option to assign a counter to >>> an RMID, event pair and monitor the bandwidth as long as it is assigned. >>> >>> On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable >>> Bandwidth Monitoring Counters) hardware feature and is enabled by default. >>> >>> The "default" mode is the existing monitoring mode that works without the >>> explicit counter assignment, instead relying on dynamic counter assignment >>> by hardware that may result in hardware not dedicating a counter resulting >>> in monitoring data reads returning "Unavailable". >>> >>> Provide an interface to display the monitor mode on the system. >>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode >>> [mbm_cntr_assign] >>> default >>> >>> Signed-off-by: Babu Moger <babu.moger@xxxxxxx> >>> --- ... >> I'm concerned that users with Intel platforms may want to use the "mbm_cntr_assign" mode >> to make the event data "more predictable" and then be concerned when the mode does >> not exist. >> >> As an alternative, is it possible to know the number of hardware counters on AMD systems >> without ABMC? I wonder if we could perhaps always expose num_mbm_cntrs as a way for >> users to know if their platform may be impacted by this type of "unpredictability" (by comparing >> num_mbm_cntrs to num_rmids). > > There is some round about(or hacky) way to find that out number of RMIDs > that can be active. Does this give consistent and accurate data? Is this something that can be added to resctrl? (Reading your other message [1] it does not sound as though it can produce an accurate number on boot.) If not then it will be up to the documentation to be accurate. >>> + >>> + AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature >>> + enable this mode by default so that counters remain assigned even when the >>> + corresponding RMID is not in use by any processor. >>> + >>> + "default": >>> + >>> + In default mode resctrl assumes there is a hardware counter for each >>> + event within every CTRL_MON and MON group. Reading mbm_total_bytes or >>> + mbm_local_bytes may report 'Unavailable' if there is no counter associated >>> + with that event. >> >> If I understand correctly, on AMD platforms without ABMC the events only report >> "Unavailable" if there is no counter assigned at the time of the query. If a counter >> is unassigned and then reassigned then the event count will reset and the user >> will get some data back but it may thus be unpredictable (to match earlier language). >> Is this correct? Any AMD platform in "default" mode may thus be vulnerable to >> "unpredictable" event counts (not just "Unavailable") ... this gets complicated > > Yes. All the AMD systems without ABMC are affected by this problem. > >> because users should be steered to avoid "default" mode if mbm_assign_mode is >> available, while not be made concerned to use "default" mode on Intel where >> mbm_assign_mode is not available. > > Can we add text to clarify this? Please do. Reinette [1] https://lore.kernel.org/all/35fc70fd-0281-4ac8-b32b-efa2f4516901@xxxxxxx/