Re: [PATCH 6/7] thermal: netlink: Add a new event to notify CPU capabilities change

Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx> · Tue, 09 Nov 2021 13:25:42 -0800



On Tue, 2021-11-09 at 17:51 +0000, Lukasz Luba wrote:
> 
> 
> On 11/9/21 2:15 PM, Srinivas Pandruvada wrote:
> > On Tue, 2021-11-09 at 13:53 +0000, Lukasz Luba wrote:
> > > Hi Srinivas,
> > > 
> > > On 11/9/21 1:23 PM, Srinivas Pandruvada wrote:
> > > > Hi Lukasz,
> > > > 
> > > > On Tue, 2021-11-09 at 12:39 +0000, Lukasz Luba wrote:
> > > > > Hi Ricardo,
> > > > > 
> > > > > 
> > > > > On 11/6/21 1:33 AM, Ricardo Neri wrote:
> > > > > > From: Srinivas Pandruvada < 
> > > > > > srinivas.pandruvada@xxxxxxxxxxxxxxx>
> > > > > > 
> > > > > > Add a new netlink event to notify change in CPU capabilities
> > > > > > in
> > > > > > terms of
> > > > > > performance and efficiency.
> > > > > 
> > > > > Is this going to be handled by some 'generic' tools? If yes,
> > > > > maybe
> > > > > the values for 'performance' might be aligned with capacity
> > > > > [0,1024] ? Or are they completely not related so the mapping is
> > > > > simply impossible?
> > > > > 
> > > > 
> > > > That would have been very useful.
> > > > 
> > > > The problem is that we may not know the maximum performance as
> > > > system
> > > > may be booting with few CPUs (using maxcpus kernel command line)
> > > > and
> > > > then user hot adding them. So we may need to rescale when we get
> > > > a
> > > > new
> > > > maximum performance CPU and send to user space.
> > > > 
> > > > We can't just use max from HFI table at in instance as it is not
> > > > necessary that HFI table contains data for all CPUs.
> > > > 
> > > > If HFI max performance value of 255 is a scaled value to max
> > > > performance CPU value in the system, then this conversion would
> > > > have
> > > > been easy. But that is not.
> > > 
> > > I see. I was asking because I'm working on similar interface and
> > > just wanted to understand your approach better. In my case we
> > > would probably simply use 'capacity' scale, or more
> > > precisely available capacity after subtracting 'thermal pressure'
> > > value.
> > > That might confuse a generic tool which listens to these socket
> > > messages, though. So probably I would have to add a new
> > > THERMAL_GENL_ATTR_CPU_CAPABILITY_* id
> > > to handle this different normalized across CPUs scale.
> > I can add a field capacity_scale. In HFI case it will always be 255.
> > In
> > your cases it will 1024.
> > 
> > 
> 
> Sounds good, with that upper limit those tools would not build
> up assumptions (they would have to parse that scale value).
> Although, I would prefer to call it 'performance_scale' if you don't
> mind.
Sure.

Thanks,
Srinivas

> I've done similar renaming  s/capacity/performance/ in the Energy Model
> (EM) some time ago [1]. Some reasons:
> - in the scheduler we have 'Performance Domains (PDs)'
> - for GPUs we talk about 'performance', because 'capacity' sounds odd
>    in that case
> 
> [1] 
> https://lore.kernel.org/linux-pm/20200527095854.21714-2-lukasz.luba@xxxxxxx/