Re: [PATCH] ACPI/CPPC: Support for batching CPPC requests

Hoan Tran <hotran@xxxxxxx> · Thu, 9 Jun 2016 19:25:22 -0700

Hi Prashanth,

On Thu, Jun 9, 2016 at 3:46 PM, Prakash, Prashanth
<pprakash@xxxxxxxxxxxxxx> wrote:
> Hi Hoan,
>
>
> On 6/9/2016 11:22 AM, Hoan Tran wrote:
>> Hi Prashanth,
>>
>> On Thu, May 26, 2016 at 3:02 PM, Alexey Klimov <alexey.klimov@xxxxxxx> wrote:
>>> On Tue, May 17, 2016 at 12:39 AM, Prashanth Prakash <pprakash@xxxxxxxxxxxxxx> wrote:
>>>> CPPC defined in section 8.4.7 of ACPI 6.1 specification suggests
>>>> "To amortize the cost of PCC transactions, OSPM should read or write
>>>> all PCC registers via a single read or write command when possible"
>>>> This patch enables opportunistic batching of frequency transition
>>>> requests whenever the request happen to overlap in time.
>>>>
>>>> Currently the access to pcc is serialized by a spin lock which does
>>>> not scale well as we increase the number of cores in the system. This
>>>> patch improves the scalability by allowing the differnt CPU cores to
>>>> update PCC subspace in parallel and by batching requests which will
>>>> reduce certain types of operation(checking command completion bit,
>>>> ringing doorbell) by a significant margin.
>>>>
>>>> Profiling shows significant improvement in the time to service freq.
>>>> transition request. Using a workload which includes multiple iteration
>>>> of configure+make of vim (with -j24 option):
>>>> Without batching requests(without this patch),
>>>>         6 domains: Avg=20us/request; 24 domains: Avg=52us/request
>>>> With batching requests(with this patch),
>>>>         6 domains: Avg=16us/request; 24 domains: Avg=19us/request
>>>> domain: An individual cpu or a set of related CPUs whose frequency can
>>>> be scaled independently
>>> With this approach sometimes you will send POSTCHANGE notifications about
>>> frequency change for some random CPUs before actual request to change
>>> frequency was sent (and received?) through PCC channel.
>>> Depending on platform/firmware/configuration this time difference might be high.
>>>
>>> How vital or important is to have POSTCHANGE notification in correct time
>>> order?
>>>
>>> Best regards,
>>> Alexey.
>>>
>>>
>>>
>>>> Signed-off-by: Prashanth Prakash <pprakash@xxxxxxxxxxxxxx>
>>>> ---
>>>>  drivers/acpi/cppc_acpi.c | 166 ++++++++++++++++++++++++++++++++++++++++-------
>>>>  1 file changed, 141 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
>>>> index 8adac69..3edfd50 100644
>>>> --- a/drivers/acpi/cppc_acpi.c
>>>> +++ b/drivers/acpi/cppc_acpi.c
>>>> @@ -43,12 +43,34 @@
>>>>
>>>>  #include <acpi/cppc_acpi.h>
>>>>  /*
>>>> - * Lock to provide mutually exclusive access to the PCC
>>>> - * channel. e.g. When the remote updates the shared region
>>>> - * with new data, the reader needs to be protected from
>>>> - * other CPUs activity on the same channel.
>>>> + * Lock to provide controlled access to the PCC channel.
>>>> + *
>>>> + * For performance critical usecases(currently cppc_set_perf)
>>>> + *     We need to take read_lock and check if channel belongs to OSPM before
>>>> + * reading or writing to PCC subspace
>>>> + *     We need to take write_lock before transferring the channel ownership to
>>>> + * the platform via a Doorbell
>>>> + *
>>>> + * For non-performance critical usecases(init)
>>>> + *     Take write_lock for all purposes which gives exclusive access
>>>> + *
>>>> + * Why not use spin lock?
>>>> + *     One of the advantages of CPPC is that we can batch/group a large number
>>>> + * of requests from differnt CPUs and send one request to the platform. Using a
>>>> + * read-write spinlock allows us to do this in a effecient manner.
>>>> + *     On larger multicore systems, if we get a large group of cppc_set_perf
>>>> + * calls they all serialize behind a spin lock, but using a rwlock we can
>>>> + * execute the requests in a much more scalable manner.
>>>> + *     See cppc_set_perf() for more details
>>>> + */
>>>> +static DEFINE_RWLOCK(pcc_lock);
>> Any chance to use mutex instead of spinlock ?
>> In some cases such as (cmd = CMD_READ || pcc_mrtt), because of waiting
>> for command completion,this spinlock is kept for a while before
>> unlock. It could waste CPU time to acquire the spinlock.
>>
>> Thanks
>> Hoan
>
> The CMD_READ is primarily used during the init, so the returns on optimizing
> that path is very limited. Since the CPPC protocol requires only an ack for
> delivery of request from the platform and the platform can take its own time
> to do actual Freq. and Voltage transition, the time we are holding the lock
> should be very limited.  At least, in the profiling I have done, I haven't noticed
> any issues.

When pcc_mrtt is not zero, CPPC still have to wait for the command completion.
For CMD_READ, if we support to read performance feedback from CPPC
counters, it's not only used during init.

Thanks
Hoan

> Having said that, I haven't done profiling to compare spinlocks vs. mutex for
> this scenario. Ashwin might have some background on why spinlocks were
> used initially.
>
> Thanks,
> Prashanth
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html