[AMD Official Use Only - General] Hi James, > -----Original Message----- > From: James Morse <james.morse@xxxxxxx> > Sent: Wednesday, December 7, 2022 11:21 AM > To: Moger, Babu <Babu.Moger@xxxxxxx> > Cc: fenghua.yu@xxxxxxxxx; dave.hansen@xxxxxxxxxxxxxxx; x86@xxxxxxxxxx; > hpa@xxxxxxxxx; paulmck@xxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; > quic_neeraju@xxxxxxxxxxx; rdunlap@xxxxxxxxxxxxx; > damien.lemoal@xxxxxxxxxxxxxxxxxx; songmuchun@xxxxxxxxxxxxx; > peterz@xxxxxxxxxxxxx; jpoimboe@xxxxxxxxxx; pbonzini@xxxxxxxxxx; > chang.seok.bae@xxxxxxxxx; pawan.kumar.gupta@xxxxxxxxxxxxxxx; > jmattson@xxxxxxxxxx; daniel.sneddon@xxxxxxxxxxxxxxx; Das1, Sandipan > <Sandipan.Das@xxxxxxx>; tony.luck@xxxxxxxxx; linux-doc@xxxxxxxxxxxxxxx; > linux-kernel@xxxxxxxxxxxxxxx; bagasdotme@xxxxxxxxx; eranian@xxxxxxxxxx; > corbet@xxxxxxx; tglx@xxxxxxxxxxxxx; mingo@xxxxxxxxxx; bp@xxxxxxxxx; > reinette.chatre@xxxxxxxxx > Subject: Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write > mbm_total_bytes_config > > Hi Babu, > > (Nit: all the 'sysfs' in the subjects should really be 'resctrl', but as they already > have 'x86/resctrl', could you just remove the sysfs? > This patch would be "x86/resctrl: Add interface to write > mbm_total_bytes_config") Sure. Will change it. > > On 04/11/2022 20:01, Babu Moger wrote: > > The current event configuration for mbm_total_bytes can be changed by > > the user by writing to the file > > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config. > > > > The event configuration settings are domain specific and will affect > > all the CPUs in the domain. > > > > Following are the types of events supported: > > > > ==== > =========================================================== > > Bits Description > > ==== > =========================================================== > > 6 Dirty Victims from the QOS domain to all types of memory > > 5 Reads to slow memory in the non-local NUMA domain > > 4 Reads to slow memory in the local NUMA domain > > 3 Non-temporal writes to non-local NUMA domain > > 2 Non-temporal writes to local NUMA domain > > 1 Reads to memory in the non-local NUMA domain > > 0 Reads to memory in the local NUMA domain > > ==== > =========================================================== > > > > For example: > > To change the mbm_total_bytes to count only reads on domain 0, the > > bits 0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33). > > Run the command. > > $echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config > > > > To change the mbm_total_bytes to count all the slow memory reads on > > domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30). > > Run the command. > > $echo 1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config > > > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > > b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > > index 18f9588a41cf..0cdccb69386e 100644 > > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > > @@ -1505,6 +1505,133 @@ static int mbm_local_bytes_config_show(struct > kernfs_open_file *of, > > return 0; > > } > > > > +static void mon_event_config_write(void *info) { > > + struct mon_config_info *mon_info = info; > > + u32 index; > > + > > + index = mon_event_config_index_get(mon_info->evtid); > > + if (index >= MAX_CONFIG_EVENTS) { > > + pr_warn_once("Invalid event id %d\n", mon_info->evtid); > > + return; > > + } > > + wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0); > } > > + > > +static int mbm_config_write(struct rdt_resource *r, struct rdt_domain *d, > > + u32 evtid, u32 val) > > +{ > > + struct mon_config_info mon_info = {0}; > > + int ret = 0; > > + > > + rdt_last_cmd_clear(); > > + > > + /* mon_config cannot be more than the supported set of events */ > > + if (val > MAX_EVT_CONFIG_BITS) { > > + rdt_last_cmd_puts("Invalid event configuration\n"); > > + return -EINVAL; > > + } > > + > > + /* > > + * Read the current config value first. If both are same then > > + * we don't need to write it again. > > + */ > > + mon_info.evtid = evtid; > > > + mondata_config_read(d, &mon_info); > > This reads the MSR on this CPU, which gets the result for this domain... [1] No. This read happens at the target domain. static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info) { smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1); } > > > > + if (mon_info.mon_config == val) > > + goto write_exit; > > + > > + mon_info.mon_config = val; > > + > > + /* > > + * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the > > + * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE > > + * are scoped at the domain level. Writing any of these MSRs > > + * on one CPU is supposed to be observed by all CPUs in the > > + * domain. However, the hardware team recommends to update > > + * these MSRs on all the CPUs in the domain. > > + */ > > > + on_each_cpu_mask(&d->cpu_mask, mon_event_config_write, > &mon_info, > > +1); > > ... but here you IPI all the CPUs in the target domain to update them. [2] There have been some changes in this area recently. The requirement of writing the value on all the CPUs in the domain is not required anymore. I am working on verifying this right now. If everything works, then I can do smp_call_function_any(&d->cpu_mask, mon_event_config_write, &mon_info, 1); I will confirm this soon. > > This means you unnecessarily IPI the CPUs in the target domain if they already > had this value, but the write syscall occurred on a domain that differs. This isn't > what you intended, but its benign. > More of a problem is: Won't this get skipped if the write syscall occurs on a > domain that happens to have the target configuration already? Do you still think this is a problem after my comment [1] above? Or Am I missing something? > > Because you need the same value to be written on every CPU ... what happens > to CPUs that are offline when the configuration is changed? Do they keep their > previous value, or does it get reset? The contents of this MSR register are held outside of all the cores. If the value changes while a cpu is offline, and it reads it once it comes online, it will get the new value. > > > I think this is best solved with a percpu variable for the current value of the > MSR. You can then read it for CPUs in a remote domain, and only issue IPIs to > 'sync' the value if needed. You can then re-use the sync call in > resctrl_online_cpu() to set the MSR to whatever value it should currently be. This may not be required with my comment 1 and 2 above. > > > > + > > + /* > > + * When an Event Configuration is changed, the bandwidth counters > > + * for all RMIDs and Events will be cleared by the hardware. The > > + * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for > > + * every RMID on the next read to any event for every RMID. > > + * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62) > > + * cleared while it is tracked by the hardware. Clear the > > + * mbm_local and mbm_total counts for all the RMIDs. > > + */ > > + memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid); > > + memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid); > > + > > +write_exit: > > + return ret; > > +} > > > > +static int mon_config_parse(struct rdt_resource *r, char *tok, u32 > > +evtid) { > > + char *dom_str = NULL, *id_str; > > + unsigned long dom_id, val; > > + struct rdt_domain *d; > > + int ret = 0; > > + > > +next: > > + if (!tok || tok[0] == '\0') > > + return 0; > > + > > + /* Start processing the strings for each domain */ > > + dom_str = strim(strsep(&tok, ";")); > > + id_str = strsep(&dom_str, "="); > > + > > + if (!dom_str || kstrtoul(id_str, 10, &dom_id)) { > > + rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n"); > > + return -EINVAL; > > + } > > + > > + if (!dom_str || kstrtoul(dom_str, 16, &val)) { > > + rdt_last_cmd_puts("Missing '=' or non-numeric event > configuration value\n"); > > + return -EINVAL; > > + } > > This is parsing the same format strings as parse_line(). Is there any chance that > code could be re-used instead of duplicated? This way anything that is added to > the format (or bugs found!) only need supporting in once place. I have checked on reusing the parse_line. The parse_line is specifically written for schemata. We can't reuse parse_line without changing it completely. Thanks Babu > > > > > + list_for_each_entry(d, &r->domains, list) { > > + if (d->id == dom_id) { > > + ret = mbm_config_write(r, d, evtid, val); > > + if (ret) > > + return -EINVAL; > > + goto next; > > + } > > + } > > + > > + return -EINVAL; > > +} > > > Thanks, > > James