Re: [PATCH v2 4/5] misc: fastrpc: Add polling mode support for fastRPC driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 3/20/2025 9:27 PM, Ekansh Gupta wrote:
>
> On 3/20/2025 7:45 PM, Dmitry Baryshkov wrote:
>> On Thu, Mar 20, 2025 at 07:19:31PM +0530, Ekansh Gupta wrote:
>>> On 1/29/2025 4:10 PM, Dmitry Baryshkov wrote:
>>>> On Wed, Jan 29, 2025 at 11:12:16AM +0530, Ekansh Gupta wrote:
>>>>> On 1/29/2025 4:59 AM, Dmitry Baryshkov wrote:
>>>>>> On Mon, Jan 27, 2025 at 10:12:38AM +0530, Ekansh Gupta wrote:
>>>>>>> For any remote call to DSP, after sending an invocation message,
>>>>>>> fastRPC driver waits for glink response and during this time the
>>>>>>> CPU can go into low power modes. Adding a polling mode support
>>>>>>> with which fastRPC driver will poll continuously on a memory
>>>>>>> after sending a message to remote subsystem which will eliminate
>>>>>>> CPU wakeup and scheduling latencies and reduce fastRPC overhead.
>>>>>>> With this change, DSP always sends a glink response which will
>>>>>>> get ignored if polling mode didn't time out.
>>>>>> Is there a chance to implement actual async I/O protocol with the help
>>>>>> of the poll() call instead of hiding the polling / wait inside the
>>>>>> invoke2?
>>>>> This design is based on the implementation on DSP firmware as of today:
>>>>> Call flow: https://github.com/quic-ekangupt/fastrpc/blob/invokev2/Docs/invoke_v2.md#5-polling-mode
>>>>>
>>>>> Can you please give some reference to the async I/O protocol that you've
>>>>> suggested? I can check if it can be implemented here.
>>>> As with the typical poll() call implementation:
>>>> - write some data using ioctl
>>>> - call poll() / select() to wait for the data to be processed
>>>> - read data using another ioctl
>>>>
>>>> Getting back to your patch. from you commit message it is not clear,
>>>> which SoCs support this feature. Reminding you that we are supporting
>>>> all kinds of platforms, including the ones that are EoLed by Qualcomm.
>>>>
>>>> Next, you wrote that in-driver polling eliminates CPU wakeup and
>>>> scheduling. However this should also increase power consumption. Is
>>>> there any measurable difference in the latencies, granted that you
>>>> already use ioctl() syscall, as such there will be two context switches.
>>>> What is the actual impact?
>>> Hi Dmitry,
>>>
>>> Thank you for your feedback.
>>>
>>> I'm currently reworking this change and adding testing details. Regarding the SoC
>>> support, I'll add all the necessary information.
>> Please make sure that both the kernel and the userspace can handle the
>> 'non-supported' case properly.
> Yes, I will include changes to handle in both userspace and kernel.

I am seeking additional suggestions on handling "non-supported" cases before making the
changes.

Userspace: To enable DSP side polling, a remote call is made as defined in the DSP image.
If this call fails, polling mode will not be enabled from userspace.

Kernel: Since this is a DSP-specific feature, I plan to add a devicetree property, such
as "qcom,polling-supported," under the fastrpc node if the DSP supports polling mode.

Does this approach seem appropriate, or is there a better way to handle this?

Thanks,
Ekansh

>
>>> For now, with in-driver
>>> polling, we are seeing significant performance improvements for calls
>>> with different sized buffers. On polling supporting platform, I've observed an
>>> ~80us improvement in latency. You can find more details in the test
>>> results here: 
>>> https://github.com/quic/fastrpc/pull/134/files#diff-7dbc6537cd3ade7fea5766229cf585db585704e02730efd72e7afc9b148e28ed
>> Does the improvement come from the CPU not goint to idle or from the
>> glink response processing?
> Although both are contributing to performance improvement, the major
> improvement is coming from CPU not going to idle state.
>
> Thanks,
> Ekansh
>
>>> Regarding your concerns about power consumption, while in-driver polling
>>> eliminates CPU wakeup and scheduling, it does increase power consumption.
>>> However, the performance gains seem to outweigh this increase.
>>>
>>> Do you think the poll implementation that you suggested above could provide similar
>>> improvements?
>> No, I agree here. I was more concentrated on userspace polling rather
>> than hw polling.
>>




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux