On Thu, Mar 20, 2025 at 07:19:31PM +0530, Ekansh Gupta wrote: > > > On 1/29/2025 4:10 PM, Dmitry Baryshkov wrote: > > On Wed, Jan 29, 2025 at 11:12:16AM +0530, Ekansh Gupta wrote: > >> > >> > >> On 1/29/2025 4:59 AM, Dmitry Baryshkov wrote: > >>> On Mon, Jan 27, 2025 at 10:12:38AM +0530, Ekansh Gupta wrote: > >>>> For any remote call to DSP, after sending an invocation message, > >>>> fastRPC driver waits for glink response and during this time the > >>>> CPU can go into low power modes. Adding a polling mode support > >>>> with which fastRPC driver will poll continuously on a memory > >>>> after sending a message to remote subsystem which will eliminate > >>>> CPU wakeup and scheduling latencies and reduce fastRPC overhead. > >>>> With this change, DSP always sends a glink response which will > >>>> get ignored if polling mode didn't time out. > >>> Is there a chance to implement actual async I/O protocol with the help > >>> of the poll() call instead of hiding the polling / wait inside the > >>> invoke2? > >> This design is based on the implementation on DSP firmware as of today: > >> Call flow: https://github.com/quic-ekangupt/fastrpc/blob/invokev2/Docs/invoke_v2.md#5-polling-mode > >> > >> Can you please give some reference to the async I/O protocol that you've > >> suggested? I can check if it can be implemented here. > > As with the typical poll() call implementation: > > - write some data using ioctl > > - call poll() / select() to wait for the data to be processed > > - read data using another ioctl > > > > Getting back to your patch. from you commit message it is not clear, > > which SoCs support this feature. Reminding you that we are supporting > > all kinds of platforms, including the ones that are EoLed by Qualcomm. > > > > Next, you wrote that in-driver polling eliminates CPU wakeup and > > scheduling. However this should also increase power consumption. Is > > there any measurable difference in the latencies, granted that you > > already use ioctl() syscall, as such there will be two context switches. > > What is the actual impact? > > Hi Dmitry, > > Thank you for your feedback. > > I'm currently reworking this change and adding testing details. Regarding the SoC > support, I'll add all the necessary information. Please make sure that both the kernel and the userspace can handle the 'non-supported' case properly. > For now, with in-driver > polling, we are seeing significant performance improvements for calls > with different sized buffers. On polling supporting platform, I've observed an > ~80us improvement in latency. You can find more details in the test > results here: > https://github.com/quic/fastrpc/pull/134/files#diff-7dbc6537cd3ade7fea5766229cf585db585704e02730efd72e7afc9b148e28ed Does the improvement come from the CPU not goint to idle or from the glink response processing? > Regarding your concerns about power consumption, while in-driver polling > eliminates CPU wakeup and scheduling, it does increase power consumption. > However, the performance gains seem to outweigh this increase. > > Do you think the poll implementation that you suggested above could provide similar > improvements? No, I agree here. I was more concentrated on userspace polling rather than hw polling. -- With best wishes Dmitry