Hi Reinette, On 12/2/24 14:15, Reinette Chatre wrote: > Hi Babu, > > On 12/2/24 11:48 AM, Moger, Babu wrote: >> On 12/2/24 12:33, Reinette Chatre wrote: >>> On 11/29/24 9:06 AM, Moger, Babu wrote: >>>> On 11/29/2024 3:59 AM, Peter Newman wrote: >>>>> On Thu, Nov 28, 2024 at 8:35 PM Moger, Babu <bmoger@xxxxxxx> wrote: >>>>>> On 11/28/2024 5:10 AM, Peter Newman wrote: >>>>>>> On Wed, Nov 27, 2024 at 8:05 PM Reinette Chatre >>>>>>> <reinette.chatre@xxxxxxxxx> wrote: >>>>>>>> >>>>>>>> Hi Babu, >>>>>>>> >>>>>>>> On 11/27/24 6:57 AM, Moger, Babu wrote: >>>>> >>>>>>>>> 1. Each group needs to remember counter ids in each domain for each event. >>>>>>>>> For example: >>>>>>>>> Resctrl group mon1 >>>>>>>>> Total event >>>>>>>>> dom 0 cntr_id 1, >>>>>>>>> dom 1 cntr_id 10 >>>>>>>>> dom 2 cntr_id 11 >>>>>>>>> >>>>>>>>> Local event >>>>>>>>> dom 0 cntr_id 2, >>>>>>>>> dom 1 cntr_id 15 >>>>>>>>> dom 2 cntr_id 10 >>>>>>>> >>>>>>>> Indeed. The challenge here is that domains may come and go so it cannot be a simple >>>>>>>> static array. As an alternative it can be an xarray indexed by the domain ID with >>>>>>>> pointers to a struct like below to contain the counters associated with the monitor >>>>>>>> group: >>>>>>>> struct cntr_id { >>>>>>>> u32 mbm_total; >>>>>>>> u32 mbm_local; >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> Thinking more about how this array needs to be managed made me wonder how the >>>>>>>> current implementation deals with domains that come and go. I do not think >>>>>>>> this is currently handled. For example, if a new domain comes online and >>>>>>>> monitoring groups had counters dynamically assigned, then these counters are >>>>>>>> not configured to the newly online domain. >>>>>> >>>>>> I am trying to understand the details of your approach here. >>>>>>> >>>>>>> In my prototype, I allocated a counter id-indexed array to each >>>>>>> monitoring domain structure for tracking the counter allocations, >>>>>>> because the hardware counters are all domain-scoped. That way the >>>>>>> tracking data goes away when the hardware does. >>>>>>> >>>>>>> I was focused on allowing all pending counter updates to a domain >>>>>>> resulting from a single mbm_assign_control write to be batched and >>>>>>> processed in a single IPI, so I structured the counter tracker >>>>>>> something like this: >>>>>> >>>>>> Not sure what you meant here. How are you batching two IPIs for two domains? >>>>>> >>>>>> #echo "//0=t;1=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control >>>>>> >>>>>> This is still a single write. Two IPIs are sent separately, one for each >>>>>> domain. >>>>>> >>>>>> Are you doing something different? >>>>> >>>>> I said "all pending counter updates to a domain", whereby I meant >>>>> targeting a single domain. >>>>> >>>>> Depending on the CPU of the caller, your example write requires 1 or 2 IPIs. >>>>> >>>>> What is important is that the following write also requires 1 or 2 IPIs: >>>>> >>>>> (assuming /sys/fs/resctrl/mon_groups/[g1-g31] exist, line breaks added >>>>> for readability) >>>>> >>>>> echo $'//0=t;1=t\n >>>>> /g1/0=t;1=t\n >>>>> /g2/0=t;1=t\n >>>>> /g3/0=t;1=t\n >>>>> /g4/0=t;1=t\n >>>>> /g5/0=t;1=t\n >>>>> /g6/0=t;1=t\n >>>>> /g7/0=t;1=t\n >>>>> /g8/0=t;1=t\n >>>>> /g9/0=t;1=t\n >>>>> /g10/0=t;1=t\n >>>>> /g11/0=t;1=t\n >>>>> /g12/0=t;1=t\n >>>>> /g13/0=t;1=t\n >>>>> /g14/0=t;1=t\n >>>>> /g15/0=t;1=t\n >>>>> /g16/0=t;1=t\n >>>>> /g17/0=t;1=t\n >>>>> /g18/0=t;1=t\n >>>>> /g19/0=t;1=t\n >>>>> /g20/0=t;1=t\n >>>>> /g21/0=t;1=t\n >>>>> /g22/0=t;1=t\n >>>>> /g23/0=t;1=t\n >>>>> /g24/0=t;1=t\n >>>>> /g25/0=t;1=t\n >>>>> /g26/0=t;1=t\n >>>>> /g27/0=t;1=t\n >>>>> /g28/0=t;1=t\n >>>>> /g29/0=t;1=t\n >>>>> /g30/0=t;1=t\n >>>>> /g31/0=t;1=t\n' >>>>> >>>>> My ultimate goal is for a thread bound to a particular domain to be >>>>> able to unassign and reassign the local domain's 32 counters in a >>>>> single write() with no IPIs at all. And when IPIs are required, then >>>>> no more than one per domain, regardless of the number of groups >>>>> updated. >>>>> >>>> >>>> Yes. I think I got the idea. Thanks. >>>> >>>>> >>>>>> >>>>>>> >>>>>>> struct resctrl_monitor_cfg { >>>>>>> int closid; >>>>>>> int rmid; >>>>>>> int evtid; >>>>>>> bool dirty; >>>>>>> }; >>>>>>> >>>>>>> This mirrors the info needed in whatever register configures the >>>>>>> counter, plus a dirty flag to skip over the ones that don't need to be >>>>>>> updated. >>>>>> >>>>>> This is what my understanding of your implementation. >>>>>> >>>>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h >>>>>> index d94abba1c716..9cebf065cc97 100644 >>>>>> --- a/include/linux/resctrl.h >>>>>> +++ b/include/linux/resctrl.h >>>>>> @@ -94,6 +94,13 @@ struct rdt_ctrl_domain { >>>>>> u32 *mbps_val; >>>>>> }; >>>>>> >>>>>> +struct resctrl_monitor_cfg { >>>>>> + int closid; >>>>>> + int rmid; >>>>>> + int evtid; >>>>>> + bool dirty; >>>>>> +}; >>>>>> + >>>>>> /** >>>>>> * struct rdt_mon_domain - group of CPUs sharing a resctrl monitor >>>>>> resource >>>>>> * @hdr: common header for different domain types >>>>>> @@ -116,6 +123,7 @@ struct rdt_mon_domain { >>>>>> struct delayed_work cqm_limbo; >>>>>> int mbm_work_cpu; >>>>>> int cqm_work_cpu; >>>>>> + /* Allocate num_mbm_cntrs entries in each domain */ >>>>>> + struct resctrl_monitor_cfg *mon_cfg; >>>>>> }; >>>>>> >>>>>> >>>>>> When a user requests an assignment for total event to the default group >>>>>> for domain 0, you go search in rdt_mon_domain(dom 0) for empty mon_cfg >>>>>> entry. >>>>>> >>>>>> If there is an empty entry, then use that entry for assignment and >>>>>> update closid, rmid, evtid and dirty = 1. We can get all these >>>>>> information from default group here. >>>>>> >>>>>> Does this make sense? >>>>> >>>>> Yes, sounds correct. >>>> >>>> I will probably add cntr_id in resctrl_monitor_cfg structure and >>>> initialize during the allocation. And rename the field 'dirty' to >>>> 'active'(or something similar) to hold the assign state for that >>>> entry. That way we have all the information required for assignment >>>> at one place. We don't need to update the rdtgroup structure. >>>> >>>> Reinette, What do you think about this approach? >>> >>> I think this approach is in the right direction. Thanks to Peter for >>> the guidance here. >>> I do not think that it is necessary to add cntr_id to resctrl_monitor_cfg >>> though, I think the cntr_id would be the index to the array instead? >> >> Yes. I think We can use the index as cntn_id. Will let you know otherwise. >> >> >>> >>> It may also be worthwhile to consider using a pointer to the resource >>> group instead of storing closid and rmid directly. If used to indicate >>> initialization then an initialized pointer is easier to distinguish than >>> the closid/rmid that may have zero as valid values. >> >> Sure. Sounds good. >> >>> >>> I expect evtid will be enum resctrl_event_id and that raises the question >>> of whether "0" can indeed be used as an "uninitialized" value since doing >>> so would change the meaning of the enum. It may indeed keep things >>> separated by maintaining evtid as an enum resctrl_event_id and note the >>> initialization differently ... either via a pointer to a resource group >>> or entirely separately as Babu indicates later. >> >> Sure. Will add evtid as enum resctrl_event_id and use the "state" to >> indicate assign/unassign/dirty status. > > Is "assign/unassign" state needed? If resctrl_monitor_cfg contains a pointer > to the resource group to which the counter has been assigned then I expect NULL > means unassigned and a value means assigned? Yes. We use the rdtgroup pointer to check the assign/unassign state. I will drop the 'state' field. Peter can add state when he wants use it for optimization later. I think we need to have the 'cntr_id" field here in resctrl_monitor_cfg. When we access the pointer from mbm_state, we wont know what is cntr_id index it came from. -- Thanks Babu Moger