Re: [PATCH v9 14/26] x86/resctrl: Introduce interface to display number of free counters

Reinette Chatre <reinette.chatre@xxxxxxxxx> · Mon, 2 Dec 2024 12:15:31 -0800

Hi Babu,

On 12/2/24 11:48 AM, Moger, Babu wrote:
> On 12/2/24 12:33, Reinette Chatre wrote:
>> On 11/29/24 9:06 AM, Moger, Babu wrote:
>>> On 11/29/2024 3:59 AM, Peter Newman wrote:
>>>> On Thu, Nov 28, 2024 at 8:35 PM Moger, Babu <bmoger@xxxxxxx> wrote:
>>>>> On 11/28/2024 5:10 AM, Peter Newman wrote:
>>>>>> On Wed, Nov 27, 2024 at 8:05 PM Reinette Chatre
>>>>>> <reinette.chatre@xxxxxxxxx> wrote:
>>>>>>>
>>>>>>> Hi Babu,
>>>>>>>
>>>>>>> On 11/27/24 6:57 AM, Moger, Babu wrote:
>>>>
>>>>>>>> 1. Each group needs to remember counter ids in each domain for each event.
>>>>>>>>      For example:
>>>>>>>>      Resctrl group mon1
>>>>>>>>       Total event
>>>>>>>>       dom 0 cntr_id 1,
>>>>>>>>       dom 1 cntr_id 10
>>>>>>>>       dom 2 cntr_id 11
>>>>>>>>
>>>>>>>>      Local event
>>>>>>>>       dom 0 cntr_id 2,
>>>>>>>>       dom 1 cntr_id 15
>>>>>>>>       dom 2 cntr_id 10
>>>>>>>
>>>>>>> Indeed. The challenge here is that domains may come and go so it cannot be a simple
>>>>>>> static array. As an alternative it can be an xarray indexed by the domain ID with
>>>>>>> pointers to a struct like below to contain the counters associated with the monitor
>>>>>>> group:
>>>>>>>           struct cntr_id {
>>>>>>>                   u32     mbm_total;
>>>>>>>                   u32     mbm_local;
>>>>>>>           }
>>>>>>>
>>>>>>>
>>>>>>> Thinking more about how this array needs to be managed made me wonder how the
>>>>>>> current implementation deals with domains that come and go. I do not think
>>>>>>> this is currently handled. For example, if a new domain comes online and
>>>>>>> monitoring groups had counters dynamically assigned, then these counters are
>>>>>>> not configured to the newly online domain.
>>>>>
>>>>> I am trying to understand the details of your approach here.
>>>>>>
>>>>>> In my prototype, I allocated a counter id-indexed array to each
>>>>>> monitoring domain structure for tracking the counter allocations,
>>>>>> because the hardware counters are all domain-scoped. That way the
>>>>>> tracking data goes away when the hardware does.
>>>>>>
>>>>>> I was focused on allowing all pending counter updates to a domain
>>>>>> resulting from a single mbm_assign_control write to be batched and
>>>>>> processed in a single IPI, so I structured the counter tracker
>>>>>> something like this:
>>>>>
>>>>> Not sure what you meant here. How are you batching two IPIs for two domains?
>>>>>
>>>>> #echo "//0=t;1=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>
>>>>> This is still a single write. Two IPIs are sent separately, one for each
>>>>> domain.
>>>>>
>>>>> Are you doing something different?
>>>>
>>>> I said "all pending counter updates to a domain", whereby I meant
>>>> targeting a single domain.
>>>>
>>>> Depending on the CPU of the caller, your example write requires 1 or 2 IPIs.
>>>>
>>>> What is important is that the following write also requires 1 or 2 IPIs:
>>>>
>>>> (assuming /sys/fs/resctrl/mon_groups/[g1-g31] exist, line breaks added
>>>> for readability)
>>>>
>>>> echo $'//0=t;1=t\n
>>>> /g1/0=t;1=t\n
>>>> /g2/0=t;1=t\n
>>>> /g3/0=t;1=t\n
>>>> /g4/0=t;1=t\n
>>>> /g5/0=t;1=t\n
>>>> /g6/0=t;1=t\n
>>>> /g7/0=t;1=t\n
>>>> /g8/0=t;1=t\n
>>>> /g9/0=t;1=t\n
>>>> /g10/0=t;1=t\n
>>>> /g11/0=t;1=t\n
>>>> /g12/0=t;1=t\n
>>>> /g13/0=t;1=t\n
>>>> /g14/0=t;1=t\n
>>>> /g15/0=t;1=t\n
>>>> /g16/0=t;1=t\n
>>>> /g17/0=t;1=t\n
>>>> /g18/0=t;1=t\n
>>>> /g19/0=t;1=t\n
>>>> /g20/0=t;1=t\n
>>>> /g21/0=t;1=t\n
>>>> /g22/0=t;1=t\n
>>>> /g23/0=t;1=t\n
>>>> /g24/0=t;1=t\n
>>>> /g25/0=t;1=t\n
>>>> /g26/0=t;1=t\n
>>>> /g27/0=t;1=t\n
>>>> /g28/0=t;1=t\n
>>>> /g29/0=t;1=t\n
>>>> /g30/0=t;1=t\n
>>>> /g31/0=t;1=t\n'
>>>>
>>>> My ultimate goal is for a thread bound to a particular domain to be
>>>> able to unassign and reassign the local domain's 32 counters in a
>>>> single write() with no IPIs at all. And when IPIs are required, then
>>>> no more than one per domain, regardless of the number of groups
>>>> updated.
>>>>
>>>
>>> Yes. I think I got the idea. Thanks.
>>>
>>>>
>>>>>
>>>>>>
>>>>>> struct resctrl_monitor_cfg {
>>>>>>       int closid;
>>>>>>       int rmid;
>>>>>>       int evtid;
>>>>>>       bool dirty;
>>>>>> };
>>>>>>
>>>>>> This mirrors the info needed in whatever register configures the
>>>>>> counter, plus a dirty flag to skip over the ones that don't need to be
>>>>>> updated.
>>>>>
>>>>> This is what my understanding of your implementation.
>>>>>
>>>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>>>>> index d94abba1c716..9cebf065cc97 100644
>>>>> --- a/include/linux/resctrl.h
>>>>> +++ b/include/linux/resctrl.h
>>>>> @@ -94,6 +94,13 @@ struct rdt_ctrl_domain {
>>>>>           u32                             *mbps_val;
>>>>>    };
>>>>>
>>>>> +struct resctrl_monitor_cfg {
>>>>> +    int closid;
>>>>> +    int rmid;
>>>>> +    int evtid;
>>>>> +    bool dirty;
>>>>> +};
>>>>> +
>>>>>    /**
>>>>>     * struct rdt_mon_domain - group of CPUs sharing a resctrl monitor
>>>>> resource
>>>>>     * @hdr:               common header for different domain types
>>>>> @@ -116,6 +123,7 @@ struct rdt_mon_domain {
>>>>>           struct delayed_work             cqm_limbo;
>>>>>           int                             mbm_work_cpu;
>>>>>           int                             cqm_work_cpu;
>>>>> +     /* Allocate num_mbm_cntrs entries in each domain */
>>>>> +       struct resctrl_monitor_cfg      *mon_cfg;
>>>>>    };
>>>>>
>>>>>
>>>>> When a user requests an assignment for total event to the default group
>>>>> for domain 0, you go search in rdt_mon_domain(dom 0) for empty mon_cfg
>>>>> entry.
>>>>>
>>>>> If there is an empty entry, then use that entry for assignment and
>>>>> update closid, rmid, evtid and dirty = 1. We can get all these
>>>>> information from default group here.
>>>>>
>>>>> Does this make sense?
>>>>
>>>> Yes, sounds correct.
>>>
>>> I will probably add cntr_id in resctrl_monitor_cfg structure and
>>> initialize during the allocation. And rename the field 'dirty' to
>>> 'active'(or something similar) to hold the assign state for that
>>> entry. That way we have all the information required for assignment
>>> at one place. We don't need to update the rdtgroup structure.
>>>
>>> Reinette, What do you think about this approach?
>>
>> I think this approach is in the right direction. Thanks to Peter for
>> the guidance here.
>> I do not think that it is necessary to add cntr_id to resctrl_monitor_cfg
>> though, I think the cntr_id would be the index to the array instead?
> 
> Yes. I think We can use the index as cntn_id. Will let you know otherwise.
> 
> 
>>
>> It may also be worthwhile to consider using a pointer to the resource
>> group instead of storing closid and rmid directly. If used to indicate
>> initialization then an initialized pointer is easier to distinguish than
>> the closid/rmid that may have zero as valid values.
> 
> Sure. Sounds good.
> 
>>
>> I expect evtid will be enum resctrl_event_id and that raises the question
>> of whether "0" can indeed be used as an "uninitialized" value since doing
>> so would change the meaning of the enum. It may indeed keep things
>> separated by maintaining evtid as an enum resctrl_event_id and note the
>> initialization differently ... either via a pointer to a resource group
>> or entirely separately as Babu indicates later.
> 
> Sure. Will add evtid as enum resctrl_event_id and use the "state" to
> indicate assign/unassign/dirty status.

Is "assign/unassign" state needed? If resctrl_monitor_cfg contains a pointer
to the resource group to which the counter has been assigned then I expect NULL
means unassigned and a value means assigned?

Reinette