Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Reinette,

On 3/12/25 12:14, Reinette Chatre wrote:
> Hi Babu,
> 
> On 3/12/25 9:03 AM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 3/12/25 10:07, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 3/11/25 1:35 PM, Moger, Babu wrote:
>>>> Hi All,
>>>>
>>>> On 3/10/25 22:51, Reinette Chatre wrote:
>>>>>
>>>>>
>>>>> On 3/10/25 6:44 PM, Moger, Babu wrote:
>>>>>> Hi Tony,
>>>>>>
>>>>>> On 3/10/2025 6:22 PM, Luck, Tony wrote:
>>>>>>> On Mon, Mar 10, 2025 at 05:48:44PM -0500, Moger, Babu wrote:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> On 3/5/2025 1:34 PM, Moger, Babu wrote:
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> On 3/5/25 04:40, Peter Newman wrote:
>>>>>>>>>> Hi Babu,
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 4, 2025 at 10:49 PM Moger, Babu <babu.moger@xxxxxxx> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>
>>>>>>>>>>> On 3/4/25 10:44, Peter Newman wrote:
>>>>>>>>>>>> On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@xxxxxxx> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Peter/Reinette,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2/26/25 07:27, Peter Newman wrote:
>>>>>>>>>>>>>> Hi Babu,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@xxxxxxx> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 2/25/25 11:11, Peter Newman wrote:
>>>>>>>>>>>>>>>> Hi Reinette,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>>>>>>>>>>>>>>> <reinette.chatre@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>>>>>>>>>>>>>>> <reinette.chatre@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>> <reinette.chatre@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>>>> <reinette.chatre@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>>>>>> <reinette.chatre@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>>>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>>>>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>>>>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>>>>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>>>>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>>>>>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>>>>>>>>>>>>>>> for.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>>>>>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>>>>>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>>>>>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>>>>>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>>>>>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>>>>>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>>>>>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>>>>>>>>>>>>>>> customers.
>>>>>>>>>>>>>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>>>>>>>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>>>>>>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>>>>>>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>>>>>>>>>>>>>>> event names.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thank you for clarifying.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>>>>>>>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>>>>>>>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>>>>>>>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>>>>>>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>>>>>>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>>>>>>>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>>>>>>>>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>>>>>>>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>>>>>>>>>>>>>>> configuration is a requirement?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>>>>>>>>>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>>>>>>>>>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>>>>>>>>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>>>>>>>>>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>>>>>>>>>>>>>>> there's less pressure on the counters.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>>>>>>>>>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>>>>>>>>>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>>>>>>>>>>>>>>> many counters the group needs in each domain.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>>>>>>>>>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>>>>>>>>>>>>>>> globally then it would not make it possible to support the full configurability
>>>>>>>>>>>>>>>>>>> of the hardware.
>>>>>>>>>>>>>>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>>>>>>>>>>>>>>> earlier example copied below:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>>>>>>>>>>>>>>> I understand it:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>>>>>>>>>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>>>>>>>>>>>>>>> in domain 1, resulting in:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>>>>>>>>>>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill
>>>>>>>>>>>>>>>>>>>     counter 1: LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>     counter 2: LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 3: VictimBW
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>>>>>>>>>>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>>>>>>>>>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>>>>>>>>>>>>>>> understand the above does contradict global counter configuration though.
>>>>>>>>>>>>>>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>>>>>>>>>>>>>>> that it is reconfigured as part of every assignment?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>>>>>>>>>>>>>>> system configuration, the user will settle on a handful of useful
>>>>>>>>>>>>>>>>>> groupings to count.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    # define global configurations (in ABMC terms), not necessarily in this
>>>>>>>>>>>>>>>>>>    # syntax and probably not in the mbm_assign_control file.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>    w=VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    # legacy "total" configuration, effectively r+w
>>>>>>>>>>>>>>>>>>    t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    /group0/0=t;1=t
>>>>>>>>>>>>>>>>>>    /group1/0=t;1=t
>>>>>>>>>>>>>>>>>>    /group2/0=_;1=t
>>>>>>>>>>>>>>>>>>    /group3/0=rw;1=_
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> - group2 is restricted to domain 0
>>>>>>>>>>>>>>>>>> - group3 is restricted to domain 1
>>>>>>>>>>>>>>>>>> - the rest are unrestricted
>>>>>>>>>>>>>>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I see. Thank you for the example.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> resctrl supports per-domain configurations with the following possible when
>>>>>>>>>>>>>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>      /group0/0=t;1=t
>>>>>>>>>>>>>>>>>      /group1/0=t;1=t
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Even though the flags are identical in all domains, the assigned counters will
>>>>>>>>>>>>>>>>> be configured differently in each domain.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>>>>>>>>>>>>>>> reasonable to carry this forward to what will be supported next.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The hardware supports both a per-domain mode, where all groups in a
>>>>>>>>>>>>>>>> domain use the same configurations and are limited to two events per
>>>>>>>>>>>>>>>> group and a per-group mode where every group can be configured and
>>>>>>>>>>>>>>>> assigned freely. This series is using the legacy counter access mode
>>>>>>>>>>>>>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>>>>>>>>>>>>>>> in the domain can be read. If we chose to read the assigned counter
>>>>>>>>>>>>>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>>>>>>>>>>>>>>> rather than asking the hardware to find the counter by RMID, we would
>>>>>>>>>>>>>>>> not be limited to 2 counters per group/domain and the hardware would
>>>>>>>>>>>>>>>> have the same flexibility as on MPAM.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In extended mode, the contents of a specific counter can be read by
>>>>>>>>>>>>>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>>>>>>>>>>>>>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>>>>>>>>>>>>>>> QM_CTR will then return the contents of the specified counter.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is documented below.
>>>>>>>>>>>>>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>>>>>>>>>>>>>>    Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We previously discussed this with you (off the public list) and I
>>>>>>>>>>>>>>> initially proposed the extended assignment mode.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, the extended mode allows greater flexibility by enabling multiple
>>>>>>>>>>>>>>> counters to be assigned to the same group, rather than being limited to
>>>>>>>>>>>>>>> just two.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> However, the challenge is that we currently lack the necessary interfaces
>>>>>>>>>>>>>>> to configure multiple events per group. Without these interfaces, the
>>>>>>>>>>>>>>> extended mode is not practical at this time.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>>>>>>>>>>>>>>> require modifications to the existing interface, allowing us to continue
>>>>>>>>>>>>>>> using it as is.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> (I might have said something confusing in my last messages because I
>>>>>>>>>>>>>>>> had forgotten that I switched to the extended assignment mode when
>>>>>>>>>>>>>>>> prototyping with soft-ABMC and MPAM.)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Forcing all groups on a domain to share the same 2 counter
>>>>>>>>>>>>>>>> configurations would not be acceptable for us, as the example I gave
>>>>>>>>>>>>>>>> earlier is one I've already been asked about.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don’t see this as a blocker. It should be considered an extension to the
>>>>>>>>>>>>>>> current ABMC series. We can easily build on top of this series once we
>>>>>>>>>>>>>>> finalize how to configure the multiple event interface for each group.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't think it is, either. Only being able to use ABMC to assign
>>>>>>>>>>>>>> counters is fine for our use as an incremental step. My longer-term
>>>>>>>>>>>>>> concern is the domain-scoped mbm_total_bytes_config and
>>>>>>>>>>>>>> mbm_local_bytes_config files, but they were introduced with BMEC, so
>>>>>>>>>>>>>> there's already an expectation that the files are present when BMEC is
>>>>>>>>>>>>>> supported.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
>>>>>>>>>>>>>> ABMC when only the BMEC-style event configuration interface exists.
>>>>>>>>>>>>>> The scope of my issue is just whether enabling "full" ABMC support
>>>>>>>>>>>>>> will require an additional opt-in, since that could remove the BMEC
>>>>>>>>>>>>>> interface. If it does, it's something we can live with.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As you know, this series is currently blocked without further feedback.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I’d like to begin reworking these patches to incorporate Peter’s feedback.
>>>>>>>>>>>>> Any input or suggestions would be appreciated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here’s what we’ve learned so far:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. Assignments should be independent of BMEC.
>>>>>>>>>>>>> 2. We should be able to specify multiple event types to a counter (e.g.,
>>>>>>>>>>>>> read, write, victimBM, etc.). This is also called shared counter
>>>>>>>>>>>>> 3. There should be an option to assign events per domain.
>>>>>>>>>>>>> 4. Currently, only two counters can be assigned per group, but the design
>>>>>>>>>>>>> should allow flexibility to assign more in the future as the interface
>>>>>>>>>>>>> evolves.
>>>>>>>>>>>>> 5. Utilize the extended RMID read mode.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is my proposal using Peter's earlier example:
>>>>>>>>>>>>>
>>>>>>>>>>>>> # define event configurations
>>>>>>>>>>>>>
>>>>>>>>>>>>> ========================================================
>>>>>>>>>>>>> Bits    Mnemonics       Description
>>>>>>>>>>>>> ====   ========================================================
>>>>>>>>>>>>> 6       VictimBW        Dirty Victims from all types of memory
>>>>>>>>>>>>> 5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
>>>>>>>>>>>>> 4       LclSlowFill     Reads to slow memory in the local NUMA domain
>>>>>>>>>>>>> 3       RmtNTWr         Non-temporal writes to non-local NUMA domain
>>>>>>>>>>>>> 2       LclNTWr         Non-temporal writes to local NUMA domain
>>>>>>>>>>>>> 1       mtFill          Reads to memory in the non-local NUMA domain
>>>>>>>>>>>>> 0       LclFill         Reads to memory in the local NUMA domain
>>>>>>>>>>>>> ====    ========================================================
>>>>>>>>>>>>>
>>>>>>>>>>>>> #Define flags based on combination of above event types.
>>>>>>>>>>>>>
>>>>>>>>>>>>> t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>> l = LclFill, LclNTWr, LclSlowFill
>>>>>>>>>>>>> r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>> w = VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>> v = VictimBW
>>>>>>>>>>>>>
>>>>>>>>>>>>> Peter suggested the following format earlier :
>>>>>>>>>>>>>
>>>>>>>>>>>>> /group0/0=t;1=t
>>>>>>>>>>>>> /group1/0=t;1=t
>>>>>>>>>>>>> /group2/0=_;1=t
>>>>>>>>>>>>> /group3/0=rw;1=_
>>>>>>>>>>>>
>>>>>>>>>>>> After some inquiries within Google, it sounds like nobody has invested
>>>>>>>>>>>> much into the current mbm_assign_control format yet, so it would be
>>>>>>>>>>>> best to drop it and distribute the configuration around the filesystem
>>>>>>>>>>>> hierarchy[1], which should allow us to produce something more flexible
>>>>>>>>>>>> and cleaner to implement.
>>>>>>>>>>>>
>>>>>>>>>>>> Roughly what I had in mind:
>>>>>>>>>>>>
>>>>>>>>>>>> Use mkdir in a info/<resource>_MON subdirectory to create free-form
>>>>>>>>>>>> names for the assignable configurations rather than being restricted
>>>>>>>>>>>> to single letters.  In the resulting directory, populate a file where
>>>>>>>>>>>> we can specify the set of events the config should represent. I think
>>>>>>>>>>>> we should use symbolic names for the events rather than raw BMEC field
>>>>>>>>>>>> values. Moving forward we could come up with portable names for common
>>>>>>>>>>>> events and only support the BMEC names on AMD machines for users who
>>>>>>>>>>>> want specific events and don't care about portability.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I’m still processing this. Let me start with some initial questions.
>>>>>>>>>>>
>>>>>>>>>>> So, we are creating event configurations here, which seems reasonable.
>>>>>>>>>>>
>>>>>>>>>>> Yes, we should use portable names and are not limited to BMEC names.
>>>>>>>>>>>
>>>>>>>>>>> How many configurations should we allow? Do we know?
>>>>>>>>>>
>>>>>>>>>> Do we need an upper limit?
>>>>>>>>>
>>>>>>>>> I think so. This needs to be maintained in some data structure. We can
>>>>>>>>> start with 2 default configurations for now.
>>>>>
>>>>> There is a big difference between no upper limit and 2. The hardware is
>>>>> capable of supporting per-domain configurations so more flexibility is
>>>>> certainly possible. Consider the example presented by Peter in:
>>>>> https://lore.kernel.org/lkml/CALPaoCi0mFZ9TycyNs+SCR+2tuRJovQ2809jYMun4HtC64hJmA@xxxxxxxxxxxxxx/
>>>>>
>>>>>>>>>>>> Next, put assignment-control file nodes in per-domain directories
>>>>>>>>>>>> (i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
>>>>>>>>>>>> counter-configuration name into the file would then allocate a counter
>>>>>>>>>>>> in the domain, apply the named configuration, and monitor the parent
>>>>>>>>>>>> group-directory. We can also put a group/resource-scoped assign_* file
>>>>>>>>>>>> higher in the hierarchy to make it easier for users who want to
>>>>>>>>>>>> configure all domains the same for a group.
>>>>>>>>>>>
>>>>>>>>>>> What is the difference between shared and exclusive?
>>>>>>>>>>
>>>>>>>>>> Shared assignment[1] means that non-exclusively-assigned counters in
>>>>>>>>>> each domain will be scheduled round-robin to the groups requesting
>>>>>>>>>> shared access to a counter. In my tests, I assigned the counters long
>>>>>>>>>> enough to produce a single 1-second MB/s sample for the per-domain
>>>>>>>>>> aggregation files[2].
>>>>>>>>>>
>>>>>>>>>> These do not need to be implemented immediately, but knowing that they
>>>>>>>>>> work addresses the overhead and scalability concerns of reassigning
>>>>>>>>>> counters and reading their values.
>>>>>>>>>
>>>>>>>>> Ok. Lets focus on exclusive assignments for now.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Having three files—assign_shared, assign_exclusive, and unassign—for each
>>>>>>>>>>> domain seems excessive. In a system with 32 groups and 12 domains, this
>>>>>>>>>>> results in 32 × 12 × 3 files, which is quite large.
>>>>>>>>>>>
>>>>>>>>>>> There should be a more efficient way to handle this.
>>>>>>>>>>>
>>>>>>>>>>> Initially, we started with a group-level file for this interface, but it
>>>>>>>>>>> was rejected due to the high number of sysfs calls, making it inefficient.
>>>>>>>>>>
>>>>>>>>>> I had rejected it due to the high-frequency of access of a large
>>>>>>>>>> number of files, which has since been addressed by shared assignment
>>>>>>>>>> (or automatic reassignment) and aggregated mbps files.
>>>>>>>>>
>>>>>>>>> I think we should address this as well. Creating three extra files for
>>>>>>>>> each group isn’t ideal when there are more efficient alternatives.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Additionally, how can we list all assignments with a single sysfs call?
>>>>>>>>>>>
>>>>>>>>>>> That was another problem we need to address.
>>>>>>>>>>
>>>>>>>>>> This is not a requirement I was aware of. If the user forgot where
>>>>>>>>>> they assigned counters (or forgot to disable auto-assignment), they
>>>>>>>>>> can read multiple sysfs nodes to remind themselves.
>>>>>>>>>
>>>>>>>>> I suggest, we should provide users with an option to list the assignments
>>>>>>>>> of all groups in a single command. As the number of groups increases, it
>>>>>>>>> becomes cumbersome to query each group individually.
>>>>>>>>>
>>>>>>>>> To achieve this, we can reuse our existing mbm_assign_control interface
>>>>>>>>> for this purpose. More details on this below.
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The configuration names listed in assign_* would result in files of
>>>>>>>>>>>> the same name in the appropriate mon_data domain directories from
>>>>>>>>>>>> which the count values can be read.
>>>>>>>>>>>>
>>>>>>>>>>>>    # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>>>>    # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>>>    # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>>>    # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>>>    # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>>> LclFill
>>>>>>>>>>>> LclNTWr
>>>>>>>>>>>> LclSlowFill
>>>>>>>>>>>
>>>>>>>>>>> I feel we can just have the configs. event_filter file is not required.
>>>>>>>>>>
>>>>>>>>>> That's right, I forgot that we can implement kernfs_ops::open(). I was
>>>>>>>>>> only looking at struct kernfs_syscall_ops
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> #cat info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>>> LclFill <-rename these to generic names.
>>>>>>>>>>> LclNTWr
>>>>>>>>>>> LclSlowFill
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think portable and non-portable event names should both be available
>>>>>>>>>> as options. There are simple bandwidth measurement mechanisms that
>>>>>>>>>> will be applied in general, but when they turn up an issue, it can
>>>>>>>>>> often lead to a more focused investigation, requiring more precise
>>>>>>>>>> events.
>>>>>>>>>
>>>>>>>>> I aggree. We should provide both portable and non-portable event names.
>>>>>>>>>
>>>>>>>>> Here is my draft proposal based on the discussion so far and reusing some
>>>>>>>>> of the current interface. Idea here is to start with basic assigment
>>>>>>>>> feature with options to enhance it in the future. Feel free to
>>>>>>>>> comment/suggest.
>>>>>>>>>
>>>>>>>>> 1. Event configurations will be in
>>>>>>>>>      /sys/fs/resctrl/info/L3_MON/counter_configs/.
>>>>>>>>>
>>>>>>>>>      There will be two pre-defined configurations by default.
>>>>>>>>>
>>>>>>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
>>>>>>>>>      LclFill, LclNTWr,LclSlowFill,VictimBM,RmtSlowFill,LclSlowFill,RmtFill
>>>>>>>>>
>>>>>>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>      LclFill, LclNTWr, LclSlowFill
>>>>>>>>>
>>>>>>>>> 2. Users will have options to update these configurations.
>>>>>>>>>
>>>>>>>>>      #echo "LclFill, LclNTWr, RmtFill" >
>>>>>>>>>         /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>
>>>>>>> This part seems odd to me. Now the "mbm_local_bytes" files aren't
>>>>>>> reporting "local_bytes" any more. They report something different,
>>>>>>> and users only know if they come to check the options currently
>>>>>>> configured in this file. Changing the contents without changing
>>>>>>> the name seems confusing to me.
>>>>>>
>>>>>> It is the same behaviour right now with BMEC. It is configurable.
>>>>>> By default it is mbm_local_bytes, but users can configure whatever they want to monitor using /info/L3_MON/mbm_local_bytes_config.
>>>>>>
>>>>>> We can continue the same behaviour with ABMC, but the configuration will be in /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes.
>>>>>
>>>>> This could be supported by following Peter's original proposal where the name
>>>>> of the counter configuration is provided by the user via a mkdir:
>>>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@xxxxxxxxxxxxxx/
>>>>>
>>>>> As he mentioned there could be pre-populated mbm_local_bytes/mbm_total_bytes.
>>>>
>>>> Sure. We can do that. I was thinking in the first phase, just provide the
>>>> default pre-defined configuration and option to update the configuration.
>>>>
>>>> We can add the mkdir support later. That way we can provide basic ABMC
>>>> support without too much code complexity with mkdir support.
>>>
>>> This is not clear to me how you envision the "first phase". Is it what you
>>> proposed above, for example:
>>>       #echo "LclFill, LclNTWr, RmtFill" >
>>>          /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>
>>> In above the counter configuration name is a file. 
>>
>> Yes. That is correct.
>>
>> There will be two configuration files by default when resctrl is mounted
>> when ABMC is enabled.
>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>
>>>
>>> How could mkdir support be added to this later if there are already files present?
>>
>> We already have these directories when resctrl is mounted.
>> /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
>> /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
>> /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
>> /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
>>
>> We dont need "mkdir" support for default  configurations.
> 
> I was referring to the "mkdir" support for additional configurations that
> I understood you are thinking about adding later. For example,
> (copied from Peter's message
> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@xxxxxxxxxxxxxx/):
> 
> 
>  # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>  # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>  # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>  # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>  # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> LclFill
> LclNTWr
> LclSlowFill
> 
> Any "later" work needs to be backward compatible with the first phase.

Actually, we dont need extra file "event_filter".
This was discussed here.
https://lore.kernel.org/lkml/CALPaoChLL8p49eANYgQ0dJiFs7G=223fGae+LJyx3DwEhNeR8A@xxxxxxxxxxxxxx/

# echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes
# echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes
# echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes
# cat info/L3_MON/counter_configs/mbm_local_bytes
 LclFill
 LclNTWr
 LclSlowFill

In the future, we can add mkdir support.

# mkdir info/L3_MON/counter_configs/mbm_read_only
# echo LclFill > info/L3_MON/counter_configs/mbm_read_only
# cat info/L3_MON/counter_configs/mbm_read_only
  LclFill

#echo mbm_read_only > test/mon_data/mon_L3_00/assign_exclusive

Which would result in the creation of test/mon_data/mon_L3_*/mbm_read_only

So, there is not breakage of backword compatibility.

> 
> If the first phase starts with a file:
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
> ... I do not see how second phase can be backward compatible when that work
> needs a directory with the same name that contains a file for configuration:
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> 
> sidenote: I think interactions with the "event_filter" file needs more
> descriptions since it is not clear with the provided example how user space
> may want to interact with the file when adding vs replacing event configurations.
> 
>>
>> My plan was to support only the default configurations in the first phase.
>> That way there is no difference in the usage model with ABMC when mounted.
>>
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>>      # #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>      LclFill, LclNTWr, RmtFill
>>>>>>>>>
>>>>>>>>> 3. The default configurations will be used when user mounts the resctrl.
>>>>>>>>>
>>>>>>>>>      mount  -t resctrl resctrl /sys/fs/resctrl/
>>>>>>>>>      mkdir /sys/fs/resctrl/test/
>>>>>>>>>
>>>>>>>>> 4. The resctrl group/domains can be in one of these assingnment states.
>>>>>>>>>      e: Exclusive
>>>>>>>>>      s: Shared
>>>>>>>>>      u: Unassigned
>>>>>>>>>
>>>>>>>>>      Exclusive mode is supported now. Shared mode will be supported in the
>>>>>>>>> future.
>>>>>>>>>
>>>>>>>>> 5. We can use the current /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>> to list the assignment state of all the groups.
>>>>>>>>>
>>>>>>>>>      Format:
>>>>>>>>>      "<CTRL_MON group>/<MON group>/<confguration>:<domain_id>=<assign state>"
>>>>>>>>>
>>>>>>>>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>      test//mbm_total_bytes:0=e;1=e
>>>>>>>>>      test//mbm_local_bytes:0=e;1=e
>>>>>>>>>      //mbm_total_bytes:0=e;1=e
>>>>>>>>>      //mbm_local_bytes:0=e;1=e
>>>>>
>>>>> This would make mbm_assign_control even more unwieldy and quicker to exceed a
>>>>> page of data (these examples never seem to reflect those AMD systems with the many
>>>>> L3 domains). How to handle resctrl files larger than 4KB needs to be well understood
>>>>> and solved when/if going this route.
>>>>
>>>> This problem is not specific this series. I feel it is a generic problem
>>>> to many of the semilar interfaces. I dont know how it is addressed. May
>>>> have to investigate on this. Any pointers would be helpful.
>>>
>>> Dave Martin already did a lot of analysis here. What other pointers do you need?

Yea. He did. I still need little more details on implementation of that.
Will come back to that when we decide which way to go.

>>>
>>>>
>>>>
>>>>>
>>>>> There seems to be two opinions about this file at moment. Would it be possible to
>>>>> summarize the discussion with pros/cons raised to make an informed selection?
>>>>> I understand that Google as represented by Peter no longer requires/requests this
>>>>> file but the motivation for this change seems new and does not seem to reduce the
>>>>> original motivation for this file. We may also want to separate requirements for reading
>>>>> from and writing to this file.
>>>>
>>>> Yea. We can just use mbm_assign_control for reading the assignment states.
>>>>
>>>> Summary: We have two proposals.
>>>>
>>>> First one from Peter:
>>>>
>>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@xxxxxxxxxxxxxx/
>>>>
>>>>
>>>> Pros
>>>> a.  Allows flexible creation of free-form names for assignable
>>>> configurations, stored in info/L3_MON/counter_configs/.
>>>>
>>>> b.  Events can be accessed using corresponding free-form names in the
>>>> mon_data directory, making it clear to users what each event represents.
>>>>
>>>>
>>>> Cons:
>>>> a. Requires three separate files for assignment in each group
>>>> (assign_exclusive, assign_shared, unassign), which might be excessive.
>>>>
>>>> b. No built-in listing support, meaning users must query each group
>>>> individually to check assignment states.
>>>>
>>>>
>>>> Second Proposal (Mine)
>>>>
>>>> https://lore.kernel.org/lkml/a4ab53b5-03be-4299-8853-e86270d46f2e@xxxxxxx/
>>>>
>>>> Pros:
>>>>
>>>> a. Maintains the flexibility of free-form names for assignable
>>>> configurations (info/L3_MON/counter_configs/).
>>>>
>>>> b. Events remain accessible via free-form names in mon_data, ensuring
>>>> clarity on their purpose.
>>>>
>>>> c. Adds the ability to list assignment states for all groups in a single
>>>> command.
>>>>
>>>> Cons:
>>>> a.  Potential buffer overflow issues when handling a large number of
>>>> groups and domains and code complexity to fix the issue.
>>>>
>>>>
>>>> Third Option: A Hybrid Approach
>>>>
>>>> We could combine elements from both proposals:
>>>>
>>>> a. Retain the free-form naming approach for assignable configurations in
>>>> info/L3_MON/counter_configs/.
>>>>
>>>> b. Use the assignment method from the first proposal:
>>>>    $mkdir test
>>>>    $echo mbm_local_bytes > test/mon_data/mon_L3_00/assign_exclusive
>>>>
>>>> c. Introduce listing support via the info/L3_MON/mbm_assign_control
>>>> interface, enabling users to read assignment states for all groups in one
>>>> place. Only reading support.
>>>>
>>>>
>>>>>
>>>>>>>>>
>>>>>>>>> 6. Users can modify the assignment state by writing to mbm_assign_control.
>>>>>>>>>
>>>>>>>>>      Format:
>>>>>>>>>      “<CTRL_MON group>/<MON group>/<configuration>:<domain_id>=<assign state>”
>>>>>>>>>
>>>>>>>>>      #echo "test//mbm_local_bytes:0=e;1=e" >
>>>>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>
>>>>>>>>>      #echo "test//mbm_local_bytes:0=u;1=u" >
>>>>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>
>>>>>>>>>      # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>      test//mbm_total_bytes:0=u;1=u
>>>>>>>>>      test//mbm_local_bytes:0=u;1=u
>>>>>>>>>      //mbm_total_bytes:0=e;1=e
>>>>>>>>>      //mbm_local_bytes:0=e;1=e
>>>>>>>>>
>>>>>>>>>      The corresponding events will be read in
>>>>>>>>>
>>>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes
>>>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
>>>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
>>>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
>>>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
>>>>>>>>>
>>>>>>>>> 7. In the first stage, only two configurations(mbm_total_bytes and
>>>>>>>>> mbm_local_bytes) will be supported.
>>>>>>>>>
>>>>>>>>> 8. In the future, there will be options to create multiple configurations
>>>>>>>>> and corresponding directory will be created in
>>>>>>>>> /sysf/fs/resctrl/test/mon_data/mon_L3_00/<configation name>.
>>>>>>>
>>>>>>> Would this be done by creating a new file in the /sys/fs/resctrl/info/L3_MON/counter_configs
>>>>>>> directory? Like this:
>>>>>>>
>>>>>>> # echo "LclFill, LclNTWr, RmtFill" >
>>>>>>>          /sys/fs/resctrl/info/L3_MON/counter_configs/cache_stuff
>>>>>>>
>>>>>>> This seems OK (dependent on the user picking meaningful names for
>>>>>>> the set of attributes picked ... but if they want to name this
>>>>>>> monitor file "brian" then they have to live with any confusion
>>>>>>> that they bring on themselves).
>>>>>>>
>>>>>>> Would this involve an extension to kernfs? I don't see a function
>>>>>>> pointer callback for file creation in kernfs_syscall_ops.
>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> I know you are all busy with multiple series going on parallel. I am still
>>>>>>>> waiting for the inputs on this. It will be great if you can spend some time
>>>>>>>> on this to see if we can find common ground on the interface.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Babu
>>>>>>>
>>>>>>> -Tony
>>>>>>>
>>>>>>
>>>>>>
>>>>>> thanks
>>>>>> Babu
>>>>>
>>>>> Reinette
>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 
> 

-- 
Thanks
Babu Moger




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux