Hi Babu, On 10/29/24 4:21 PM, Babu Moger wrote: > Introduce the interface to assign MBM events in mbm_cntr_assign mode. > > Events can be enabled or disabled by writing to file > /sys/fs/resctrl/info/L3_MON/mbm_assign_control > > Format is similar to the list format with addition of opcode for the > assignment operation. > "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>" > > Format for specific type of groups: > > * Default CTRL_MON group: > "//<domain_id><opcode><flags>" > > * Non-default CTRL_MON group: > "<CTRL_MON group>//<domain_id><opcode><flags>" > > * Child MON group of default CTRL_MON group: > "/<MON group>/<domain_id><opcode><flags>" > > * Child MON group of non-default CTRL_MON group: > "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>" > > Domain_id '*' will apply the flags on all the domains. > > Opcode can be one of the following: > > = Update the assignment to match the flags > + Assign a new MBM event without impacting existing assignments. > - Unassign a MBM event from currently assigned events. > > Assignment flags can be one of the following: > t MBM total event > l MBM local event > tl Both total and local MBM events > _ None of the MBM events. Valid only with '=' opcode. This flag cannot > be combined with other flags. > > Signed-off-by: Babu Moger <babu.moger@xxxxxxx> > --- > v9: Fixed handling special case '//0=' and '//". > Removed extra strstr() call. > Added generic failure text when assignment operation fails. > Corrected user documentation format texts. > > v8: Moved unassign as the first action during the assign modification. > Assign none "_" takes priority. Cannot be mixed with other flags. > Updated the documentation and .rst file format. htmldoc looks ok. > > v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write(). > Added mutex lock in rdtgroup_mbm_assign_control_write() while processing. > Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name. > Fixed rdtgroup_str_to_mon_state to return error for invalid flags. > Simplified the calls rdtgroup_assign_cntr by merging few functions earlier. > Removed ABMC reference in FS code. > Reinette commented about handling the combination of flags like 'lt_' and '_lt'. > Not sure if we need to change the behaviour here. Processed them sequencially right now. > Users have the liberty to pass the flags. Restricting it might be a problem later. > > v6: Added support assign all if domain id is '*' > Fixed the allocation of counter id if it not assigned already. > > v5: Interface name changed from mbm_assign_control to mbm_control. > Fixed opcode and flags combination. > '=_" is valid. > "-_" amd "+_" is not valid. > Minor message update. > Renamed the function with prefix - rdtgroup_. > Corrected few documentation mistakes. > Rebase related changes after SNC support. > > v4: Added domain specific assignments. Fixed the opcode parsing. > > v3: New patch. > Addresses the feedback to provide the global assignment interface. > https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@xxxxxxxxx/ > --- > Documentation/arch/x86/resctrl.rst | 116 +++++++++++- > arch/x86/kernel/cpu/resctrl/internal.h | 10 ++ > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 236 ++++++++++++++++++++++++- > 3 files changed, 360 insertions(+), 2 deletions(-) > > diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst > index 590727bec44b..d0a107d251ec 100644 > --- a/Documentation/arch/x86/resctrl.rst > +++ b/Documentation/arch/x86/resctrl.rst > @@ -347,7 +347,8 @@ with the following files: > t MBM total event is assigned. > l MBM local event is assigned. > tl Both MBM total and local events are assigned. > - _ None of the MBM events are assigned. > + _ None of the MBM events are assigned. Only works with opcode '=' for write > + and cannot be combined with other flags. > > Examples: > :: > @@ -365,6 +366,119 @@ with the following files: > There are four resctrl groups. All the groups have total and local MBM events > assigned on domain 0 and 1. > > + Assignment state can be updated by writing to the interface. This is already a bit far from original definition so it may help to be specific what is meant with "the interface". For example, Assignment state can be updated by writing to "mbm_assign_control". > + > + Format is similar to the list format with addition of opcode for the > + assignment operation. > + > + "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>" > + > + Format for each type of groups: "Format for each type of group" or "Format of each type of group"? > + > + * Default CTRL_MON group: > + "//<domain_id><opcode><flags>" > + > + * Non-default CTRL_MON group: > + "<CTRL_MON group>//<domain_id><opcode><flags>" > + > + * Child MON group of default CTRL_MON group: > + "/<MON group>/<domain_id><opcode><flags>" > + > + * Child MON group of non-default CTRL_MON group: > + "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>" > + > + Domain_id '*' will apply the flags on all the domains. "apply the flags on all the domains" -> "apply the flags to all the domains"? > + > + Opcode can be one of the following: > + :: > + > + = Update the assignment to match the MBM event. > + + Assign a new MBM event without impacting existing assignments. > + - Unassign a MBM event from currently assigned events. > + > + Examples: > + Initial group status: > + :: > + > + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + non_default_ctrl_mon_grp//0=tl;1=tl; > + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; > + //0=tl;1=tl; > + /child_default_mon_grp/0=tl;1=tl; > + > + To update the default group to assign only total MBM event on domain 0: > + :: > + > + # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + > + Assignment status after the update: > + :: > + > + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + non_default_ctrl_mon_grp//0=tl;1=tl; > + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; > + //0=t;1=tl; > + /child_default_mon_grp/0=tl;1=tl; > + > + To update the MON group child_default_mon_grp to remove total MBM event on domain 1: > + :: > + > + # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + > + Assignment status after the update: > + :: > + > + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + non_default_ctrl_mon_grp//0=tl;1=tl; > + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; > + //0=t;1=tl; > + /child_default_mon_grp/0=tl;1=l; > + > + To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to unassign > + both local and total MBM events on domain 1: > + :: > + > + # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" > > + /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + > + Assignment status after the update: > + :: > + > + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + non_default_ctrl_mon_grp//0=tl;1=tl; > + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_; > + //0=t;1=tl; > + /child_default_mon_grp/0=tl;1=l; > + > + To update the default group to add a local MBM event domain 0. "." -> ":" > + :: > + > + # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + > + Assignment status after the update: > + :: > + > + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + non_default_ctrl_mon_grp//0=tl;1=tl; > + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_; > + //0=tl;1=tl; > + /child_default_mon_grp/0=tl;1=l; > + > + To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all the > + MBM events on all the domains. "." -> ":" > + :: > + > + # echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + > + Assignment status after the update: > + :: > + > + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + non_default_ctrl_mon_grp//0=_;1=_; > + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_; > + //0=tl;1=tl; > + /child_default_mon_grp/0=tl;1=l; > + > "max_threshold_occupancy": > Read/write file provides the largest value (in > bytes) at which a previously used LLC_occupancy > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h > index b90d8c90b4b6..3ccaea6a2803 100644 > --- a/arch/x86/kernel/cpu/resctrl/internal.h > +++ b/arch/x86/kernel/cpu/resctrl/internal.h > @@ -74,6 +74,16 @@ > */ > #define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2) > > +/* > + * Assignment flags for mbm_cntr_assign feature > + */ "mbm_cntr_assign feature" -> "mbm_cntr_assign mode"? > +enum { > + ASSIGN_NONE = 0, > + ASSIGN_TOTAL = BIT(QOS_L3_MBM_TOTAL_EVENT_ID), > + ASSIGN_LOCAL = BIT(QOS_L3_MBM_LOCAL_EVENT_ID), > + ASSIGN_INVALID, > +}; > + > /** > * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that > * aren't marked nohz_full > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > index 5cc40eacbe85..9fe419d0c536 100644 > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > @@ -1082,6 +1082,239 @@ static int rdtgroup_mbm_assign_control_show(struct kernfs_open_file *of, > return 0; > } > > +static int rdtgroup_str_to_mon_state(char *flag) It seems strange to me that a variable used to contain flag bits is of type int. Why is it not unsigned? > +{ > + int i, mon_state = ASSIGN_NONE; > + > + if (!strlen(flag)) > + return ASSIGN_INVALID; > + > + for (i = 0; i < strlen(flag); i++) { > + switch (*(flag + i)) { > + case 't': > + mon_state |= ASSIGN_TOTAL; > + break; > + case 'l': > + mon_state |= ASSIGN_LOCAL; > + break; > + case '_': > + return ASSIGN_NONE; > + default: > + return ASSIGN_INVALID; > + } > + } > + > + return mon_state; > +} > + > +static struct rdtgroup *rdtgroup_find_grp_by_name(enum rdt_group_type rtype, > + char *p_grp, char *c_grp) > +{ > + struct rdtgroup *rdtg, *crg; > + > + if (rtype == RDTCTRL_GROUP && *p_grp == '\0') { > + return &rdtgroup_default; > + } else if (rtype == RDTCTRL_GROUP) { > + list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) > + if (!strcmp(p_grp, rdtg->kn->name)) > + return rdtg; > + } else if (rtype == RDTMON_GROUP) { > + list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) { > + if (!strcmp(p_grp, rdtg->kn->name)) { > + list_for_each_entry(crg, &rdtg->mon.crdtgrp_list, > + mon.crdtgrp_list) { > + if (!strcmp(c_grp, crg->kn->name)) > + return crg; > + } > + } > + } > + } > + > + return NULL; > +} > + > +static int rdtgroup_process_flags(struct rdt_resource *r, > + enum rdt_group_type rtype, > + char *p_grp, char *c_grp, char *tok) > +{ > + int op, mon_state, assign_state, unassign_state; Same comment about type ... these *_state variables are used to contain bits representing the flags of the various states. An unsigned variable seems more appropriate. > + char *dom_str, *id_str, *op_str; > + struct rdt_mon_domain *d; > + struct rdtgroup *rdtgrp; > + unsigned long dom_id; > + int ret, found = 0; Could found be boolean? > + > + rdtgrp = rdtgroup_find_grp_by_name(rtype, p_grp, c_grp); > + > + if (!rdtgrp) { > + rdt_last_cmd_puts("Not a valid resctrl group\n"); > + return -EINVAL; > + } > + > +next: > + if (!tok || tok[0] == '\0') > + return 0; > + > + /* Start processing the strings for each domain */ > + dom_str = strim(strsep(&tok, ";")); > + > + op_str = strpbrk(dom_str, "=+-"); > + > + if (op_str) { > + op = *op_str; > + } else { > + rdt_last_cmd_puts("Missing operation =, +, - character\n"); > + return -EINVAL; > + } > + > + id_str = strsep(&dom_str, "=+-"); > + > + /* Check for domain id '*' which means all domains */ > + if (id_str && *id_str == '*') { > + d = NULL; > + goto check_state; > + } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) { > + rdt_last_cmd_puts("Missing domain id\n"); > + return -EINVAL; > + } > + > + /* Verify if the dom_id is valid */ > + list_for_each_entry(d, &r->mon_domains, hdr.list) { > + if (d->hdr.id == dom_id) { > + found = 1; > + break; > + } > + } > + > + if (!found) { > + rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id); > + return -EINVAL; > + } I am missing how "found" is handled on second iteration. If an invalid domain follows a valid domain it seems like "found" remains set from previous iteration? > + > +check_state: > + mon_state = rdtgroup_str_to_mon_state(dom_str); > + > + if (mon_state == ASSIGN_INVALID) { > + rdt_last_cmd_puts("Invalid assign flag\n"); > + goto out_fail; > + } > + > + assign_state = 0; > + unassign_state = 0; > + > + switch (op) { > + case '+': > + if (mon_state == ASSIGN_NONE) { > + rdt_last_cmd_puts("Invalid assign opcode\n"); > + goto out_fail; > + } > + assign_state = mon_state; > + break; > + case '-': > + if (mon_state == ASSIGN_NONE) { > + rdt_last_cmd_puts("Invalid assign opcode\n"); > + goto out_fail; > + } > + unassign_state = mon_state; > + break; > + case '=': > + assign_state = mon_state; > + unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state; > + break; > + default: > + break; > + } > + > + if (unassign_state & ASSIGN_TOTAL) { > + ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID); > + if (ret) > + goto out_fail; > + } > + > + if (unassign_state & ASSIGN_LOCAL) { > + ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID); > + if (ret) > + goto out_fail; > + } > + > + if (assign_state & ASSIGN_TOTAL) { > + ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID); > + if (ret) > + goto out_fail; > + } > + > + if (assign_state & ASSIGN_LOCAL) { > + ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID); > + if (ret) > + goto out_fail; > + } > + > + goto next; > + > +out_fail: > + rdt_last_cmd_printf("Assign operation '%c%s' failed on the group %s/%s/\n", > + op, dom_str, p_grp, c_grp); > + Can the domain id be printed also? This seems only piece missing to understand what failed. > + return -EINVAL; > +} > + > +static ssize_t rdtgroup_mbm_assign_control_write(struct kernfs_open_file *of, > + char *buf, size_t nbytes, loff_t off) > +{ > + struct rdt_resource *r = of->kn->parent->priv; > + char *token, *cmon_grp, *mon_grp; > + enum rdt_group_type rtype; > + int ret; > + > + /* Valid input requires a trailing newline */ > + if (nbytes == 0 || buf[nbytes - 1] != '\n') > + return -EINVAL; > + > + buf[nbytes - 1] = '\0'; > + > + cpus_read_lock(); > + mutex_lock(&rdtgroup_mutex); > + > + rdt_last_cmd_clear(); > + > + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) { > + rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n"); > + mutex_unlock(&rdtgroup_mutex); > + cpus_read_unlock(); > + return -EINVAL; > + } > + > + while ((token = strsep(&buf, "\n")) != NULL) { > + /* > + * The write command follows the following format: > + * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>” > + * Extract the CTRL_MON group. > + */ > + cmon_grp = strsep(&token, "/"); > + > + /* > + * Extract the MON_GROUP. > + * strsep returns empty string for contiguous delimiters. > + * Empty mon_grp here means it is a RDTCTRL_GROUP. > + */ > + mon_grp = strsep(&token, "/"); > + > + if (*mon_grp == '\0') > + rtype = RDTCTRL_GROUP; > + else > + rtype = RDTMON_GROUP; > + > + ret = rdtgroup_process_flags(r, rtype, cmon_grp, mon_grp, token); > + if (ret) > + break; > + } > + > + mutex_unlock(&rdtgroup_mutex); > + cpus_read_unlock(); > + > + return ret ?: nbytes; > +} > + > #ifdef CONFIG_PROC_CPU_RESCTRL > > /* > @@ -2383,9 +2616,10 @@ static struct rftype res_common_files[] = { > }, > { > .name = "mbm_assign_control", > - .mode = 0444, > + .mode = 0644, > .kf_ops = &rdtgroup_kf_single_ops, > .seq_show = rdtgroup_mbm_assign_control_show, > + .write = rdtgroup_mbm_assign_control_write, > }, > { > .name = "mbm_assign_mode", Reinette