Hi, On Wed, Jan 22, 2025 at 02:20:31PM -0600, Babu Moger wrote: > When mbm_cntr_assign mode is enabled, users can designate which of the MBM > events in the CTRL_MON or MON groups should have counters assigned. > > Provide an interface for assigning MBM events by writing to the file: > /sys/fs/resctrl/info/L3_MON/mbm_assign_control. Using this interface, > events can be assigned or unassigned as needed. > > Format is similar to the list format with addition of opcode for the > assignment operation. > "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>" [...] > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > index 6e29827239e0..299839bcf23f 100644 > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > @@ -1050,6 +1050,244 @@ static int resctrl_mbm_assign_control_show(struct kernfs_open_file *of, [...] > +static ssize_t resctrl_mbm_assign_control_write(struct kernfs_open_file *of, > + char *buf, size_t nbytes, loff_t off) > +{ > + struct rdt_resource *r = of->kn->parent->priv; > + char *token, *cmon_grp, *mon_grp; > + enum rdt_group_type rtype; > + int ret; > + > + /* Valid input requires a trailing newline */ > + if (nbytes == 0 || buf[nbytes - 1] != '\n') > + return -EINVAL; > + > + buf[nbytes - 1] = '\0'; > + > + cpus_read_lock(); > + mutex_lock(&rdtgroup_mutex); > + > + rdt_last_cmd_clear(); > + > + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) { > + rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n"); > + mutex_unlock(&rdtgroup_mutex); > + cpus_read_unlock(); > + return -EINVAL; > + } > + > + while ((token = strsep(&buf, "\n")) != NULL) { > + /* > + * The write command follows the following format: > + * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>” > + * Extract the CTRL_MON group. > + */ > + cmon_grp = strsep(&token, "/"); > + As when reading this file, I think that the data can grow larger than a page and get split into multiple write() calls. I don't currently think the file needs to be redesigned, but there are some concerns about how userspace will work with it that need to be sorted out. Every monitoring group can contribute a line to this file: CTRL_GROUP / MON_GROUP / DOMAIN = [t][l] [ ; DOMAIN = [t][l] ]* LF so, 2 * (NAME_MAX + 1) + NUM_DOMAINS * 5 - 1 + 1 NAME_MAX on Linux is 255, so with, say, up to 16 domains, that's about 600 bytes per monitoring group in the worst case. We don't need to have many control and monitoring groups for this to grow potentially over 4K. We could simply place a limit on how much userspace is allowed to write to this file in one go, although this restriction feels difficult for userspace to follow -- but maybe this is workable in the short term, on current systems (?) Otherwise, since we expect this interface to be written using scripting languages, I think we need to be prepared to accept fully-buffered I/O. That means that the data may be cut at random places, not necessarily at newlines. (For smaller files such as schemata this is not such an issue, since the whole file is likely to be small enough to fit into the default stdio buffers -- this is how sysfs gets away with it IIUC.) For fully-buffered I/O, we may have to cache an incomplete line in between write() calls. If there is a dangling incomplete line when the file is closed then it is hard to tell userspace, because people often don't bother to check the return value of close(), fclose() etc. However, since it's an ABI violation for userspace to end this file with a partial line, I think it's sufficient to report that via last_cmd_status. (Making close() return -EIO still seems a good idea though, just in case userspace is listening.) I hacked up something a bit like this so that schemata could be written interactively from the shell, so I can try to port that onto this series as an illustration, if it helps. Cheers ---Dave