Hi Reinette, On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote: > Hi Dave, > > On 2/12/25 9:46 AM, Dave Martin wrote: > > Hi there, > > > > On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote: > >> > >> This series adds the support for Assignable Bandwidth Monitoring Counters > >> (ABMC). It is also called QoS RMID Pinning feature > >> > >> Series is written such that it is easier to support other assignable > >> features supported from different vendors. > >> > >> The feature details are documented in the APM listed below [1]. > >> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming > >> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth > >> Monitoring (ABMC). The documentation is available at > >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 > >> > >> The patches are based on top of commit > >> d361b84d51bfe (tip/master) Merge branch into tip/master: 'x86/tdx' [...] > >> b. Check how many ABMC counters are available. > >> > >> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs > >> 32 > > > > Is this file needed? > > > > With MPAM, it is more difficult to promise that the same number of > > counters will be available everywhere. > > > > Rather than lie, or report a "safe" value here that may waste some > > counters, can we just allow the number of counters to be be discovered > > per domain via available_mbm_cntrs? > > This sounds reasonable to me. I think us having trouble with the > user documentation of this file so late in development should also have been > a sign to rethink its value. > > For a user to discover the number of counters supported via available_mbm_cntrs > would require the file's contents to be captured right after mount. Since we've > had scenarios where new userspace needs to discover an up-and-running system's > configuration this may not be possible. I thus wonder instead of removing > num_mbm_cntrs, it could be modified to return the per-domain supported counters > instead of a single value? Is it actually useful to be able to discover the number of counters that exist? A counter that exists but is not available cannot be used, so perhaps it is not useful to know about it in the first place. But if we keep this file but make it report the number of counters for each domain (similarly to mbm_available_cntrs), then I think the MPAM driver should be able to work with that. > > num_closids and num_rmids are already problematic for MPAM, so it would > > be good to avoid any more parameters of this sort from being reported > > to userspace unless there is a clear understanding of why they are > > needed. > > Yes. Appreciate your help in identifying what could be problematic for MPAM. For clarity: this is a background issue, mostly orthogonal to this series. If this series is merged as-is, with a global per-resource num_mbm_cntrs property, then this not really worse than the current situation -- it's just a bit annoying from the MPAM perspective. In a nutshell, the num_closids / num_rmids parameters seem to expose RDT-specific hardware semantics to userspace, implying a specific allocation model for control group and monitoring group identifiers. The guarantees that userspace is entitled to asssume when resctrl reports particular values do not seem to be well described and are hard to map onto the nearest-equivalent MPAM implementation. A combination of control and monitoring groups that can be created on x86 may not be creatable on MPAM, even when the number of supportable control and monitoring partitions is the same. Even with the ABMC series, we may still be constrained on what we can report for num_rmids: we can't know in advance whether or not the user is going to use mbm_cntr_assign mode -- if not, we can't promise to create more monitoring groups than the number of counters in the hardware. It seems natural for the counts reported by "available_mbm_cntrs" to change dynamically when the ABMC assignment mode is changed, but I think userspace are likely to expect the global "num_rmids" parameters to be fixed for the lifetime of the resctrl mount (and possibly fixed for all time on a given hardware platform -- at least, modulo CDP). I think it might be possible to tighten up the docmentation of num_closids in particular in a way that doesn't conflict with x86 and may make it easier for MPAM to fit in with, but that feels like a separate conversation. None of this should be considered a blocker for this series, either way. > > > > Reporting number of counters per monitoring domain is a more natural > > fit for MPAM, as below: > > > >> c. Check how many ABMC counters are available in each domain. > >> > >> # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs > >> 0=30;1=30 > > > > For MPAM, this seems supportable. Each monitoring domain will have > > some counters, and a well-defined number of them will be available for > > allocation at any one time. [...] > >> e. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control [...] > >> Flags can be one of the following: > >> > >> t MBM total event is enabled. > >> l MBM local event is enabled. > >> tl Both total and local MBM events are enabled. > >> _ None of the MBM events are enabled > >> > >> Examples: > > > > [...] > > > > I think that this basically works for MPAM. > > > > The local/total distinction doesn't map in a consistent way onto MPAM, > > but this problem is not specific to ABMC. It feels sensible for ABMC > > to be built around the same concepts that resctrl already has elsewhere > > in the interface. MPAM will do its best to fit (as already). > > > > Regarding Peter's use case of assiging multiple counters to a > > monitoring group [1], I feel that it's probably good enough to make > > sure that the ABMC interface can be extended in future in a backwards > > compatible way so as to support this, without trying to support it > > immediately. > > > > [1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@xxxxxxxxxxxxxx/ > > > > I do not think that resctrl's current support of the mbm_total_bytes and > mbm_local_bytes should be considered as the "only" two available "slots" > into which all possible events should be forced into. "mon_features" exists > to guide user space to which events are supported and as I see it new events > can be listed here to inform user space of their availability, with their > associated event files available in the resource groups. That's fair. I wasn't currently sure how (or if) the set of countable events was expected to grow / evolve via this route. Either way, I think this confirms that there is at least one viable way to enable more counters for a single control group, on top of this series. (If there is more than one way, that seems fine?) > > > > For example, if we added new generic "letters" -- say, "0" to "9", > > combined with new counter files in resctrlfs, that feels like a > > possible approach. ABMC (as in this series) should just reject such > > such assignments, and the new counter files wouldn't exist. > > > > Availability of this feature could also be reported as a distinct mode > > in mbm_assign_mode, say "mbm_cntr_generic", or whatever. > > > > > > A _sketch_ of this follows. This is NOT a proposal -- the key > > question is whether we are confident that we can extend the interface > > in this way in the future without breaking anything. > > > > If "yes", then the ABMC interface (as proposed by this series) works as > > a foundation to build on. > > > > --8<-- > > > > [artists's impression] > > > > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode > > mbm_cntr_generic > > [mbm_cntr_assign] > > default > > > > # echo mbm_cntr_generic >/sys/fs/resctrl/info/L3_MON/mbm_assign_mode > > # echo '//0=01;1=23' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control > > # echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter0_bytes_type > > # echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter1_bytes_type > > # echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter2_bytes_type > > # echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter3_bytes_type > > > > ... > > > > # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_counter1_bytes > > > > etc. > > > > It is not clear to me what additional features such an interface enables. It > also looks like user space will need to track and manage counter IDs? My idea was that for these generic counters, new files could be exposed to configure what they actually count (the ..._type files shown above; or possibly via the ..._config files that already exist). The "IDs" were inteded as abstract; the number only relates the assignments in mbm_assign_control to the files created elsewhere. This wouldn't be related to IDs assigned by the hardware. If there are multiple resctrl users then using numeric IDs might be problematic; though if we go eventually in the direction of making resctrlfs multi-mountable then each mount could have its own namespace. Allowing counters to be named and configured with a mkdir()-style interface might be possible too; that might make it easier for users to coexist within a single resctrl mount (if we think that's important enough). > It sounds to me as though the issue starts with your statement > "The local/total distinction doesn't map in a consistent way onto MPAM". To > address this I expect that an MPAM system will not support nor list > mbm_total_bytes and/or mbm_local_bytes in its mon_features file (*)? Instead, > it would list the events that are appropriate to the system? Trying to match > with what Peter said [1] in the message you refer to, this may be possible: > > # cat /sys/fs/resctrl/info/L3_MON/mon_features > mbm_local_read_bytes > mbm_local_write_bytes > mbm_local_bytes > > (*) I am including mbm_local_bytes since it could be an event that can be software > defined as a sum of mbm_local_read_bytes and mbm_local_write_bytes when they are both > counted. > > I see the support for MPAM events distinct from the support of assignable counters. > Once the MPAM events are sorted, I think that they can be assigned with existing interface. > Please help me understand if you see it differently. > > Doing so would need to come up with alphabetical letters for these events, > which seems to be needed for your proposal also? If we use possible flags of: > > mbm_local_read_bytes a > mbm_local_write_bytes b > > Then mbm_assign_control can be used as: > # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control > # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes > <value> > # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes > <sum of mbm_local_read_bytes and mbm_local_write_bytes> > > One issue would be when resctrl needs to support more than 26 events (no more flags available), > assuming that upper case would be used for "shared" counters (unless this interface is defined > differently and only few uppercase letters used for it). Would this be too low of a limit? > > Reinette > > [1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@xxxxxxxxxxxxxx/ That approach would also work, where an MPAM system has events are not a reasonable approximation of the generic "total" or "local". For now we would probably stick with "total" and "local" anyway though, because the MPAM architecture doesn't natively allow the mapping onto the memory system topology to be discovered, and the information in ACPI / device tree is insufficient to tell us everything we'd need to know. But I guess what counts as "local" in particular will be quite hardware and topology dependent even on x86, so perhaps we shouldn't worry about having the behaviour match exactly (?) Regarding the code letters, my idea was that the event type might be configured by a separate file, instead of in mbm_assign_control directly, in which case running out of letters wouldn't be a problem. Alternatively, if we want to be able to expand beyond single letters, could we reserve one or more characters for extension purposes? If braces are forbidden by the syntax today, could we add support for something like the following later on, without breaking anything? # echo '//0={foo}{bar};1={bar}' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control For now, my main concern would be whether this series prevents that sort of thing being added in a backwards compatible way later. I don't really see anything that is a blocker. What do you think? Cheers ---Dave