Re: [PATCH] drm/i915/pmu: Drop custom hotplug code

"Liang, Kan" <kan.liang@xxxxxxxxxxxxxxx> · Thu, 13 Feb 2025 08:36:46 -0500

On 2025-02-12 11:13 p.m., Lucas De Marchi wrote:
> On Tue, Jan 21, 2025 at 12:19:15PM -0500, Liang, Kan wrote:
>>
>>
>> On 2025-01-21 11:59 a.m., Lucas De Marchi wrote:
>>> On Tue, Jan 21, 2025 at 10:53:31AM -0500, Liang, Kan wrote:
>>>>
>>>>
>>>> On 2025-01-21 9:29 a.m., Lucas De Marchi wrote:
>>>>> On Mon, Jan 20, 2025 at 08:42:41PM -0500, Liang, Kan wrote:
>>>>>>>>> -static int i915_pmu_cpu_offline(unsigned int cpu, struct
>>>>>>>>> hlist_node
>>>>>>>>> *node)
>>>>>>>>> -{
>>>>>>>>> -    struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu),
>>>>>>>>> cpuhp.node);
>>>>>>>>> -    unsigned int target = i915_pmu_target_cpu;
>>>>>>>>> -
>>>>>>>>> -    /*
>>>>>>>>> -     * Unregistering an instance generates a CPU offline event
>>>>>>>>> which
>>>>>>>>> we must
>>>>>>>>> -     * ignore to avoid incorrectly modifying the shared
>>>>>>>>> i915_pmu_cpumask.
>>>>>>>>> -     */
>>>>>>>>> -    if (!pmu->registered)
>>>>>>>>> -        return 0;
>>>>>>>>> -
>>>>>>>>> -    if (cpumask_test_and_clear_cpu(cpu, &i915_pmu_cpumask)) {
>>>>>>>>> -        target = cpumask_any_but(topology_sibling_cpumask(cpu),
>>>>>>>>> cpu);
>>>>>>>>> -
>>>>>>>>
>>>>>>>> I'm not familar with the i915 PMU, but it seems suggest a core
>>>>>>>> scope
>>>>>>>> PMU, not a system-wide scope.
>>>>>>>
>>>>>>> counter is in a complete separate device - it doesn't depend on
>>>>>>> core or
>>>>>>> die or pkg - not sure why it cared about topology_sibling_cpumask
>>>>>>> here.
>>>>>>
>>>>>> OK. But it's still a behavior change. Please make it clear in the
>>>>>> description that the patch also changes/fixes the scope from core
>>>>>> scope
>>>>>> to system-wide.
>>>>>
>>>>> sure... do you have a suggestion how to test the hotplug? For testing
>>>>> purposes, can I force the perf cpu assigned to be something other than
>>>>> the cpu0?
>>>>
>>>> Yes, it's a bit tricky to verify the hotplug if the assigned CPU is
>>>> CPU0. I don't know a way to force another CPU without changing the
>>>> code.
>>>> You may have to instrument the code for the test.
>>>>
>>>> Another test you may want to do is the perf system-wide test, e.g.,
>>>> perf
>>>> stat -a -e i915/actual-frequency/ sleep 1.
>>>>
>>>> The existing code assumes the counter is core scope. So the result
>>>> should be huge, since perf will read the counter on each core and add
>>>> them up.
>>>
>>> that is not allowed and it simply fails to init the counter:
>>>
>>> static int i915_pmu_event_init(struct perf_event *event)
>>>     ...
>>>     if (event->cpu < 0)
>>>         return -EINVAL;
>>>     if (!cpumask_test_cpu(event->cpu, &i915_pmu_cpumask))
>>>         return -EINVAL;
>>>     ...
>>> }
>>>
>>> event only succeeds the initialization in the assigned cpu. I see no
>>> differences in results (using i915/interrupts/ since freq is harder to
>>> compare):
>>>
>>> $ sudo perf stat -e i915/interrupts/  sleep 1
>>>
>>>  Performance counter stats for 'system wide':
>>>
>>>                253      i915/
>>> interrupts/                                                     
>>>        1.002215175 seconds time elapsed
>>>
>>> $ sudo perf stat -a  -e i915/interrupts/  sleep 1
>>>
>>>  Performance counter stats for 'system wide':
>>>
>>>                251      i915/
>>> interrupts/                                                     
>>>        1.000900818 seconds time elapsed
>>>
>>> Note that our cpumask attr already returns just the assigned cpu and
>>> perf-stat only tries to open on that cpu:
>>>
>>> $ strace --follow -s 1024 -e perf_event_open --  perf stat -a  -e i915/
>>> interrupts/  sleep 1
>>>
>>> [pid 55777] perf_event_open({type=0x24 /* PERF_TYPE_??? */, size=0x88 /*
>>> PERF_ATTR_SIZE_??? */, config=0x100002, sample_period=0,
>>> sample_type=PERF_SAMPLE_IDENTIFIER,
>>> read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|
>>> PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, precise_ip=0 /*
>>> arbitrary skid */, exclude_guest=1, ...}, -1, 0, -1,
>>> PERF_FLAG_FD_CLOEXEC) = 3
>>>
>>
>> I see. The behavior is not changed with the patch. It should be just the
> 
> humn... the behavior doesn't change when using perf because perf will
> read the cpumask and use it accordingly. However apparently now it's not
> working anymore to reject calls to perf_event_open() that have a cpu
> that doesn't match the cpumask.
> 
> Just like before I have this output:
> 
> $ sudo cat /sys/devices/i915/cpumask 0
> 
> However if perf_event_open() is called with cpu == 1, it succeeds.
> Example:
> 
>     attr_init(&attr);
>     perf_event_open(&attr, -1, 1, -1, 0);
> 
> I was expecting it to fail and set errno to ENODEV, but that is not the
> case. For this particular system I'm seeing these values in
> perf_try_init_event():
> 
>     event->cpu == 1
>     cpumask=0-19
>     pmu_cpumask=0
> 
> Re-reading this: it will accept any (online) cpu of the system. Same
> behavior occurs with other scopes: any cpu of that scope is accepted and
> event->cpu will still keep what the user passed in (rather than the
> calculated by perf_try_init_event(). Is that expected?

Yes, for a system-wide event, it can be read from any CPU. The CPU mask
in the sysfs only tells the perf tool that only 1 CPU is required to get
system-wide information. It doesn't have to be the advised CPU. It can
be any CPU in the scope.

https://lore.kernel.org/all/20240802151643.1691631-3-kan.liang@xxxxxxxxxxxxxxx/

Thanks,
Kan
> 
> Lucas De Marchi
> 
>> topology_sibling_cpumask() which implies a misleading message.
>> Thanks for the confirmation.
>>
>>
>>> Lucas De Marchi
>>>
>>>> But this patch claims that the counter is system-wide. With the patch,
>>>> the same perf command should only read the counter on the assigned CPU.
>>>>
>>>> Please also post the test results in the changelog. That's the reason
>>>> why the scope has to be changed.
>>>
>>> it seems that migration code is simply wrong, not that we are changing
>>> the scope here - it was already considered system-wide. I can add a
>>> paragraph in the commit message explaining it.
>>>
>>
>> Yes, please.
>>
>> Thanks,
>> Kan
>>