On 3/20/2025 9:27 PM, Ekansh Gupta wrote: > > On 3/20/2025 7:45 PM, Dmitry Baryshkov wrote: >> On Thu, Mar 20, 2025 at 07:19:31PM +0530, Ekansh Gupta wrote: >>> On 1/29/2025 4:10 PM, Dmitry Baryshkov wrote: >>>> On Wed, Jan 29, 2025 at 11:12:16AM +0530, Ekansh Gupta wrote: >>>>> On 1/29/2025 4:59 AM, Dmitry Baryshkov wrote: >>>>>> On Mon, Jan 27, 2025 at 10:12:38AM +0530, Ekansh Gupta wrote: >>>>>>> For any remote call to DSP, after sending an invocation message, >>>>>>> fastRPC driver waits for glink response and during this time the >>>>>>> CPU can go into low power modes. Adding a polling mode support >>>>>>> with which fastRPC driver will poll continuously on a memory >>>>>>> after sending a message to remote subsystem which will eliminate >>>>>>> CPU wakeup and scheduling latencies and reduce fastRPC overhead. >>>>>>> With this change, DSP always sends a glink response which will >>>>>>> get ignored if polling mode didn't time out. >>>>>> Is there a chance to implement actual async I/O protocol with the help >>>>>> of the poll() call instead of hiding the polling / wait inside the >>>>>> invoke2? >>>>> This design is based on the implementation on DSP firmware as of today: >>>>> Call flow: https://github.com/quic-ekangupt/fastrpc/blob/invokev2/Docs/invoke_v2.md#5-polling-mode >>>>> >>>>> Can you please give some reference to the async I/O protocol that you've >>>>> suggested? I can check if it can be implemented here. >>>> As with the typical poll() call implementation: >>>> - write some data using ioctl >>>> - call poll() / select() to wait for the data to be processed >>>> - read data using another ioctl >>>> >>>> Getting back to your patch. from you commit message it is not clear, >>>> which SoCs support this feature. Reminding you that we are supporting >>>> all kinds of platforms, including the ones that are EoLed by Qualcomm. >>>> >>>> Next, you wrote that in-driver polling eliminates CPU wakeup and >>>> scheduling. However this should also increase power consumption. Is >>>> there any measurable difference in the latencies, granted that you >>>> already use ioctl() syscall, as such there will be two context switches. >>>> What is the actual impact? >>> Hi Dmitry, >>> >>> Thank you for your feedback. >>> >>> I'm currently reworking this change and adding testing details. Regarding the SoC >>> support, I'll add all the necessary information. >> Please make sure that both the kernel and the userspace can handle the >> 'non-supported' case properly. > Yes, I will include changes to handle in both userspace and kernel. I am seeking additional suggestions on handling "non-supported" cases before making the changes. Userspace: To enable DSP side polling, a remote call is made as defined in the DSP image. If this call fails, polling mode will not be enabled from userspace. Kernel: Since this is a DSP-specific feature, I plan to add a devicetree property, such as "qcom,polling-supported," under the fastrpc node if the DSP supports polling mode. Does this approach seem appropriate, or is there a better way to handle this? Thanks, Ekansh > >>> For now, with in-driver >>> polling, we are seeing significant performance improvements for calls >>> with different sized buffers. On polling supporting platform, I've observed an >>> ~80us improvement in latency. You can find more details in the test >>> results here: >>> https://github.com/quic/fastrpc/pull/134/files#diff-7dbc6537cd3ade7fea5766229cf585db585704e02730efd72e7afc9b148e28ed >> Does the improvement come from the CPU not goint to idle or from the >> glink response processing? > Although both are contributing to performance improvement, the major > improvement is coming from CPU not going to idle state. > > Thanks, > Ekansh > >>> Regarding your concerns about power consumption, while in-driver polling >>> eliminates CPU wakeup and scheduling, it does increase power consumption. >>> However, the performance gains seem to outweigh this increase. >>> >>> Do you think the poll implementation that you suggested above could provide similar >>> improvements? >> No, I agree here. I was more concentrated on userspace polling rather >> than hw polling. >>