On 3/20/2025 7:45 PM, Dmitry Baryshkov wrote: > On Thu, Mar 20, 2025 at 07:19:31PM +0530, Ekansh Gupta wrote: >> >> On 1/29/2025 4:10 PM, Dmitry Baryshkov wrote: >>> On Wed, Jan 29, 2025 at 11:12:16AM +0530, Ekansh Gupta wrote: >>>> >>>> On 1/29/2025 4:59 AM, Dmitry Baryshkov wrote: >>>>> On Mon, Jan 27, 2025 at 10:12:38AM +0530, Ekansh Gupta wrote: >>>>>> For any remote call to DSP, after sending an invocation message, >>>>>> fastRPC driver waits for glink response and during this time the >>>>>> CPU can go into low power modes. Adding a polling mode support >>>>>> with which fastRPC driver will poll continuously on a memory >>>>>> after sending a message to remote subsystem which will eliminate >>>>>> CPU wakeup and scheduling latencies and reduce fastRPC overhead. >>>>>> With this change, DSP always sends a glink response which will >>>>>> get ignored if polling mode didn't time out. >>>>> Is there a chance to implement actual async I/O protocol with the help >>>>> of the poll() call instead of hiding the polling / wait inside the >>>>> invoke2? >>>> This design is based on the implementation on DSP firmware as of today: >>>> Call flow: https://github.com/quic-ekangupt/fastrpc/blob/invokev2/Docs/invoke_v2.md#5-polling-mode >>>> >>>> Can you please give some reference to the async I/O protocol that you've >>>> suggested? I can check if it can be implemented here. >>> As with the typical poll() call implementation: >>> - write some data using ioctl >>> - call poll() / select() to wait for the data to be processed >>> - read data using another ioctl >>> >>> Getting back to your patch. from you commit message it is not clear, >>> which SoCs support this feature. Reminding you that we are supporting >>> all kinds of platforms, including the ones that are EoLed by Qualcomm. >>> >>> Next, you wrote that in-driver polling eliminates CPU wakeup and >>> scheduling. However this should also increase power consumption. Is >>> there any measurable difference in the latencies, granted that you >>> already use ioctl() syscall, as such there will be two context switches. >>> What is the actual impact? >> Hi Dmitry, >> >> Thank you for your feedback. >> >> I'm currently reworking this change and adding testing details. Regarding the SoC >> support, I'll add all the necessary information. > Please make sure that both the kernel and the userspace can handle the > 'non-supported' case properly. Yes, I will include changes to handle in both userspace and kernel. > >> For now, with in-driver >> polling, we are seeing significant performance improvements for calls >> with different sized buffers. On polling supporting platform, I've observed an >> ~80us improvement in latency. You can find more details in the test >> results here: >> https://github.com/quic/fastrpc/pull/134/files#diff-7dbc6537cd3ade7fea5766229cf585db585704e02730efd72e7afc9b148e28ed > Does the improvement come from the CPU not goint to idle or from the > glink response processing? Although both are contributing to performance improvement, the major improvement is coming from CPU not going to idle state. Thanks, Ekansh > >> Regarding your concerns about power consumption, while in-driver polling >> eliminates CPU wakeup and scheduling, it does increase power consumption. >> However, the performance gains seem to outweigh this increase. >> >> Do you think the poll implementation that you suggested above could provide similar >> improvements? > No, I agree here. I was more concentrated on userspace polling rather > than hw polling. >