Hi Babu, On 12/2/24 12:42 PM, Moger, Babu wrote: > Hi Reinette, > > On 12/2/24 14:15, Reinette Chatre wrote: >> Hi Babu, >> >> On 12/2/24 11:48 AM, Moger, Babu wrote: >>> On 12/2/24 12:33, Reinette Chatre wrote: >>>> On 11/29/24 9:06 AM, Moger, Babu wrote: >>>>> On 11/29/2024 3:59 AM, Peter Newman wrote: >>>>>> On Thu, Nov 28, 2024 at 8:35 PM Moger, Babu <bmoger@xxxxxxx> wrote: >>>>>>> On 11/28/2024 5:10 AM, Peter Newman wrote: >>>>>>>> On Wed, Nov 27, 2024 at 8:05 PM Reinette Chatre >>>>>>>> <reinette.chatre@xxxxxxxxx> wrote: >>>>>>>>> >>>>>>>>> Hi Babu, >>>>>>>>> >>>>>>>>> On 11/27/24 6:57 AM, Moger, Babu wrote: >>>>>> >>>>>>>>>> 1. Each group needs to remember counter ids in each domain for each event. >>>>>>>>>> For example: >>>>>>>>>> Resctrl group mon1 >>>>>>>>>> Total event >>>>>>>>>> dom 0 cntr_id 1, >>>>>>>>>> dom 1 cntr_id 10 >>>>>>>>>> dom 2 cntr_id 11 >>>>>>>>>> >>>>>>>>>> Local event >>>>>>>>>> dom 0 cntr_id 2, >>>>>>>>>> dom 1 cntr_id 15 >>>>>>>>>> dom 2 cntr_id 10 >>>>>>>>> >>>>>>>>> Indeed. The challenge here is that domains may come and go so it cannot be a simple >>>>>>>>> static array. As an alternative it can be an xarray indexed by the domain ID with >>>>>>>>> pointers to a struct like below to contain the counters associated with the monitor >>>>>>>>> group: >>>>>>>>> struct cntr_id { >>>>>>>>> u32 mbm_total; >>>>>>>>> u32 mbm_local; >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> Thinking more about how this array needs to be managed made me wonder how the >>>>>>>>> current implementation deals with domains that come and go. I do not think >>>>>>>>> this is currently handled. For example, if a new domain comes online and >>>>>>>>> monitoring groups had counters dynamically assigned, then these counters are >>>>>>>>> not configured to the newly online domain. >>>>>>> >>>>>>> I am trying to understand the details of your approach here. >>>>>>>> >>>>>>>> In my prototype, I allocated a counter id-indexed array to each >>>>>>>> monitoring domain structure for tracking the counter allocations, >>>>>>>> because the hardware counters are all domain-scoped. That way the >>>>>>>> tracking data goes away when the hardware does. >>>>>>>> >>>>>>>> I was focused on allowing all pending counter updates to a domain >>>>>>>> resulting from a single mbm_assign_control write to be batched and >>>>>>>> processed in a single IPI, so I structured the counter tracker >>>>>>>> something like this: >>>>>>> >>>>>>> Not sure what you meant here. How are you batching two IPIs for two domains? >>>>>>> >>>>>>> #echo "//0=t;1=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control >>>>>>> >>>>>>> This is still a single write. Two IPIs are sent separately, one for each >>>>>>> domain. >>>>>>> >>>>>>> Are you doing something different? >>>>>> >>>>>> I said "all pending counter updates to a domain", whereby I meant >>>>>> targeting a single domain. >>>>>> >>>>>> Depending on the CPU of the caller, your example write requires 1 or 2 IPIs. >>>>>> >>>>>> What is important is that the following write also requires 1 or 2 IPIs: >>>>>> >>>>>> (assuming /sys/fs/resctrl/mon_groups/[g1-g31] exist, line breaks added >>>>>> for readability) >>>>>> >>>>>> echo $'//0=t;1=t\n >>>>>> /g1/0=t;1=t\n >>>>>> /g2/0=t;1=t\n >>>>>> /g3/0=t;1=t\n >>>>>> /g4/0=t;1=t\n >>>>>> /g5/0=t;1=t\n >>>>>> /g6/0=t;1=t\n >>>>>> /g7/0=t;1=t\n >>>>>> /g8/0=t;1=t\n >>>>>> /g9/0=t;1=t\n >>>>>> /g10/0=t;1=t\n >>>>>> /g11/0=t;1=t\n >>>>>> /g12/0=t;1=t\n >>>>>> /g13/0=t;1=t\n >>>>>> /g14/0=t;1=t\n >>>>>> /g15/0=t;1=t\n >>>>>> /g16/0=t;1=t\n >>>>>> /g17/0=t;1=t\n >>>>>> /g18/0=t;1=t\n >>>>>> /g19/0=t;1=t\n >>>>>> /g20/0=t;1=t\n >>>>>> /g21/0=t;1=t\n >>>>>> /g22/0=t;1=t\n >>>>>> /g23/0=t;1=t\n >>>>>> /g24/0=t;1=t\n >>>>>> /g25/0=t;1=t\n >>>>>> /g26/0=t;1=t\n >>>>>> /g27/0=t;1=t\n >>>>>> /g28/0=t;1=t\n >>>>>> /g29/0=t;1=t\n >>>>>> /g30/0=t;1=t\n >>>>>> /g31/0=t;1=t\n' >>>>>> >>>>>> My ultimate goal is for a thread bound to a particular domain to be >>>>>> able to unassign and reassign the local domain's 32 counters in a >>>>>> single write() with no IPIs at all. And when IPIs are required, then >>>>>> no more than one per domain, regardless of the number of groups >>>>>> updated. >>>>>> >>>>> >>>>> Yes. I think I got the idea. Thanks. >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> struct resctrl_monitor_cfg { >>>>>>>> int closid; >>>>>>>> int rmid; >>>>>>>> int evtid; >>>>>>>> bool dirty; >>>>>>>> }; >>>>>>>> >>>>>>>> This mirrors the info needed in whatever register configures the >>>>>>>> counter, plus a dirty flag to skip over the ones that don't need to be >>>>>>>> updated. >>>>>>> >>>>>>> This is what my understanding of your implementation. >>>>>>> >>>>>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h >>>>>>> index d94abba1c716..9cebf065cc97 100644 >>>>>>> --- a/include/linux/resctrl.h >>>>>>> +++ b/include/linux/resctrl.h >>>>>>> @@ -94,6 +94,13 @@ struct rdt_ctrl_domain { >>>>>>> u32 *mbps_val; >>>>>>> }; >>>>>>> >>>>>>> +struct resctrl_monitor_cfg { >>>>>>> + int closid; >>>>>>> + int rmid; >>>>>>> + int evtid; >>>>>>> + bool dirty; >>>>>>> +}; >>>>>>> + >>>>>>> /** >>>>>>> * struct rdt_mon_domain - group of CPUs sharing a resctrl monitor >>>>>>> resource >>>>>>> * @hdr: common header for different domain types >>>>>>> @@ -116,6 +123,7 @@ struct rdt_mon_domain { >>>>>>> struct delayed_work cqm_limbo; >>>>>>> int mbm_work_cpu; >>>>>>> int cqm_work_cpu; >>>>>>> + /* Allocate num_mbm_cntrs entries in each domain */ >>>>>>> + struct resctrl_monitor_cfg *mon_cfg; >>>>>>> }; >>>>>>> >>>>>>> >>>>>>> When a user requests an assignment for total event to the default group >>>>>>> for domain 0, you go search in rdt_mon_domain(dom 0) for empty mon_cfg >>>>>>> entry. >>>>>>> >>>>>>> If there is an empty entry, then use that entry for assignment and >>>>>>> update closid, rmid, evtid and dirty = 1. We can get all these >>>>>>> information from default group here. >>>>>>> >>>>>>> Does this make sense? >>>>>> >>>>>> Yes, sounds correct. >>>>> >>>>> I will probably add cntr_id in resctrl_monitor_cfg structure and >>>>> initialize during the allocation. And rename the field 'dirty' to >>>>> 'active'(or something similar) to hold the assign state for that >>>>> entry. That way we have all the information required for assignment >>>>> at one place. We don't need to update the rdtgroup structure. >>>>> >>>>> Reinette, What do you think about this approach? >>>> >>>> I think this approach is in the right direction. Thanks to Peter for >>>> the guidance here. >>>> I do not think that it is necessary to add cntr_id to resctrl_monitor_cfg >>>> though, I think the cntr_id would be the index to the array instead? >>> >>> Yes. I think We can use the index as cntn_id. Will let you know otherwise. >>> >>> >>>> >>>> It may also be worthwhile to consider using a pointer to the resource >>>> group instead of storing closid and rmid directly. If used to indicate >>>> initialization then an initialized pointer is easier to distinguish than >>>> the closid/rmid that may have zero as valid values. >>> >>> Sure. Sounds good. >>> >>>> >>>> I expect evtid will be enum resctrl_event_id and that raises the question >>>> of whether "0" can indeed be used as an "uninitialized" value since doing >>>> so would change the meaning of the enum. It may indeed keep things >>>> separated by maintaining evtid as an enum resctrl_event_id and note the >>>> initialization differently ... either via a pointer to a resource group >>>> or entirely separately as Babu indicates later. >>> >>> Sure. Will add evtid as enum resctrl_event_id and use the "state" to >>> indicate assign/unassign/dirty status. >> >> Is "assign/unassign" state needed? If resctrl_monitor_cfg contains a pointer >> to the resource group to which the counter has been assigned then I expect NULL >> means unassigned and a value means assigned? > > Yes. We use the rdtgroup pointer to check the assign/unassign state. > > I will drop the 'state' field. Peter can add state when he wants use it > for optimization later. > > I think we need to have the 'cntr_id" field here in resctrl_monitor_cfg. > When we access the pointer from mbm_state, we wont know what is cntr_id > index it came from. > oh, good point. I wonder how Peter addressed this in his PoC. As an alternative, could the cntr_id be used in mbm_state instead of a pointer? Reinette