Re: [PATCH] ACPI/CPPC: Support for batching CPPC requests

Hoan Tran <hotran@xxxxxxx> · Fri, 10 Jun 2016 14:45:43 -0700

Hi Prashanth,

On Fri, Jun 10, 2016 at 2:22 PM, Prakash, Prashanth
<pprakash@xxxxxxxxxxxxxx> wrote:
> Hi Hoan,
>
> On 6/9/2016 8:25 PM, Hoan Tran wrote:
>> Hi Prashanth,
>>
>> On Thu, Jun 9, 2016 at 3:46 PM, Prakash, Prashanth
>> <pprakash@xxxxxxxxxxxxxx> wrote:
>>> Hi Hoan,
>>>
>>>
>>> On 6/9/2016 11:22 AM, Hoan Tran wrote:
>>>> Hi Prashanth,
>>>>
>>>> On Thu, May 26, 2016 at 3:02 PM, Alexey Klimov <alexey.klimov@xxxxxxx> wrote:
>>>>> On Tue, May 17, 2016 at 12:39 AM, Prashanth Prakash <pprakash@xxxxxxxxxxxxxx> wrote:
>>>>>> CPPC defined in section 8.4.7 of ACPI 6.1 specification suggests
>>>>>> "To amortize the cost of PCC transactions, OSPM should read or write
>>>>>> all PCC registers via a single read or write command when possible"
>>>>>> This patch enables opportunistic batching of frequency transition
>>>>>> requests whenever the request happen to overlap in time.
>>>>>>
>>>>>> Currently the access to pcc is serialized by a spin lock which does
>>>>>> not scale well as we increase the number of cores in the system. This
>>>>>> patch improves the scalability by allowing the differnt CPU cores to
>>>>>> update PCC subspace in parallel and by batching requests which will
>>>>>> reduce certain types of operation(checking command completion bit,
>>>>>> ringing doorbell) by a significant margin.
>>>>>>
>>>>>> Profiling shows significant improvement in the time to service freq.
>>>>>> transition request. Using a workload which includes multiple iteration
>>>>>> of configure+make of vim (with -j24 option):
>>>>>> Without batching requests(without this patch),
>>>>>>         6 domains: Avg=20us/request; 24 domains: Avg=52us/request
>>>>>> With batching requests(with this patch),
>>>>>>         6 domains: Avg=16us/request; 24 domains: Avg=19us/request
>>>>>> domain: An individual cpu or a set of related CPUs whose frequency can
>>>>>> be scaled independently
>>>>> With this approach sometimes you will send POSTCHANGE notifications about
>>>>> frequency change for some random CPUs before actual request to change
>>>>> frequency was sent (and received?) through PCC channel.
>>>>> Depending on platform/firmware/configuration this time difference might be high.
>>>>>
>>>>> How vital or important is to have POSTCHANGE notification in correct time
>>>>> order?
>>>>>
>>>>> Best regards,
>>>>> Alexey.
>>>>>
>>>>>
>>>>>
>>>>>> Signed-off-by: Prashanth Prakash <pprakash@xxxxxxxxxxxxxx>
>>>>>> ---
>>>>>>  drivers/acpi/cppc_acpi.c | 166 ++++++++++++++++++++++++++++++++++++++++-------
>>>>>>  1 file changed, 141 insertions(+), 25 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
>>>>>> index 8adac69..3edfd50 100644
>>>>>> --- a/drivers/acpi/cppc_acpi.c
>>>>>> +++ b/drivers/acpi/cppc_acpi.c
>>>>>> @@ -43,12 +43,34 @@
>>>>>>
>>>>>>  #include <acpi/cppc_acpi.h>
>>>>>>  /*
>>>>>> - * Lock to provide mutually exclusive access to the PCC
>>>>>> - * channel. e.g. When the remote updates the shared region
>>>>>> - * with new data, the reader needs to be protected from
>>>>>> - * other CPUs activity on the same channel.
>>>>>> + * Lock to provide controlled access to the PCC channel.
>>>>>> + *
>>>>>> + * For performance critical usecases(currently cppc_set_perf)
>>>>>> + *     We need to take read_lock and check if channel belongs to OSPM before
>>>>>> + * reading or writing to PCC subspace
>>>>>> + *     We need to take write_lock before transferring the channel ownership to
>>>>>> + * the platform via a Doorbell
>>>>>> + *
>>>>>> + * For non-performance critical usecases(init)
>>>>>> + *     Take write_lock for all purposes which gives exclusive access
>>>>>> + *
>>>>>> + * Why not use spin lock?
>>>>>> + *     One of the advantages of CPPC is that we can batch/group a large number
>>>>>> + * of requests from differnt CPUs and send one request to the platform. Using a
>>>>>> + * read-write spinlock allows us to do this in a effecient manner.
>>>>>> + *     On larger multicore systems, if we get a large group of cppc_set_perf
>>>>>> + * calls they all serialize behind a spin lock, but using a rwlock we can
>>>>>> + * execute the requests in a much more scalable manner.
>>>>>> + *     See cppc_set_perf() for more details
>>>>>> + */
>>>>>> +static DEFINE_RWLOCK(pcc_lock);
>>>> Any chance to use mutex instead of spinlock ?
>>>> In some cases such as (cmd = CMD_READ || pcc_mrtt), because of waiting
>>>> for command completion,this spinlock is kept for a while before
>>>> unlock. It could waste CPU time to acquire the spinlock.
>>>>
>>>> Thanks
>>>> Hoan
>>> The CMD_READ is primarily used during the init, so the returns on optimizing
>>> that path is very limited. Since the CPPC protocol requires only an ack for
>>> delivery of request from the platform and the platform can take its own time
>>> to do actual Freq. and Voltage transition, the time we are holding the lock
>>> should be very limited.  At least, in the profiling I have done, I haven't noticed
>>> any issues.
>> When pcc_mrtt is not zero, CPPC still have to wait for the command completion.
>> For CMD_READ, if we support to read performance feedback from CPPC
>> counters, it's not only used during init.
> Looks like there was some discussion on why spinlocks earlier as well:
> https://lists.linaro.org/pipermail/linaro-acpi/2015-September/005797.html
>
> One could make an argument that pcc_mrtt and nominal command latency are
> typically quite low(in very low us) as platform only acks or provides some info.

With CMD_READ, firmware updates a full CPPC table. I don't think it's
fast enough.

> Since the spec doesn't put any restriction on pcc_mrtt, theoretically one could
> design a platform with high pcc_mrtt, then the current implementation will be
> sub-optimal on those.
>
> My hesitation for changing this to mutex/semaphore is primarily due to lack
> profiling.

You can profile the opensource version, simply change spinlock to mutex.
I think it's good if we can use mutex/semaphore.

Thanks
Hoan

> How about we do this,  I will post v2 of this patch fixing the issue
> Alexey pointed out about POSTCHANGE notification.
> Following that, I can change this read-write spinlock into a read-write semaphore,
> profile it and  if it looks good I can submit another patch.
>
> If you have any profiling data from a different platform that suggest one way
> is better than the other, please share it as well.
>
> Thanks,
> Prashanth
>>
>> Thanks
>> Hoan
>>
>>> Having said that, I haven't done profiling to compare spinlocks vs. mutex for
>>> this scenario. Ashwin might have some background on why spinlocks were
>>> used initially.
>>>
>>> Thanks,
>>> Prashanth
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html