On 3/21/2025 4:47 PM, Dmitry Baryshkov wrote: > On Fri, 21 Mar 2025 at 12:18, Ekansh Gupta > <ekansh.gupta@xxxxxxxxxxxxxxxx> wrote: >> >> >> On 3/20/2025 9:27 PM, Ekansh Gupta wrote: >>> On 3/20/2025 7:45 PM, Dmitry Baryshkov wrote: >>>> On Thu, Mar 20, 2025 at 07:19:31PM +0530, Ekansh Gupta wrote: >>>>> On 1/29/2025 4:10 PM, Dmitry Baryshkov wrote: >>>>>> On Wed, Jan 29, 2025 at 11:12:16AM +0530, Ekansh Gupta wrote: >>>>>>> On 1/29/2025 4:59 AM, Dmitry Baryshkov wrote: >>>>>>>> On Mon, Jan 27, 2025 at 10:12:38AM +0530, Ekansh Gupta wrote: >>>>>>>>> For any remote call to DSP, after sending an invocation message, >>>>>>>>> fastRPC driver waits for glink response and during this time the >>>>>>>>> CPU can go into low power modes. Adding a polling mode support >>>>>>>>> with which fastRPC driver will poll continuously on a memory >>>>>>>>> after sending a message to remote subsystem which will eliminate >>>>>>>>> CPU wakeup and scheduling latencies and reduce fastRPC overhead. >>>>>>>>> With this change, DSP always sends a glink response which will >>>>>>>>> get ignored if polling mode didn't time out. >>>>>>>> Is there a chance to implement actual async I/O protocol with the help >>>>>>>> of the poll() call instead of hiding the polling / wait inside the >>>>>>>> invoke2? >>>>>>> This design is based on the implementation on DSP firmware as of today: >>>>>>> Call flow: https://github.com/quic-ekangupt/fastrpc/blob/invokev2/Docs/invoke_v2.md#5-polling-mode >>>>>>> >>>>>>> Can you please give some reference to the async I/O protocol that you've >>>>>>> suggested? I can check if it can be implemented here. >>>>>> As with the typical poll() call implementation: >>>>>> - write some data using ioctl >>>>>> - call poll() / select() to wait for the data to be processed >>>>>> - read data using another ioctl >>>>>> >>>>>> Getting back to your patch. from you commit message it is not clear, >>>>>> which SoCs support this feature. Reminding you that we are supporting >>>>>> all kinds of platforms, including the ones that are EoLed by Qualcomm. >>>>>> >>>>>> Next, you wrote that in-driver polling eliminates CPU wakeup and >>>>>> scheduling. However this should also increase power consumption. Is >>>>>> there any measurable difference in the latencies, granted that you >>>>>> already use ioctl() syscall, as such there will be two context switches. >>>>>> What is the actual impact? >>>>> Hi Dmitry, >>>>> >>>>> Thank you for your feedback. >>>>> >>>>> I'm currently reworking this change and adding testing details. Regarding the SoC >>>>> support, I'll add all the necessary information. >>>> Please make sure that both the kernel and the userspace can handle the >>>> 'non-supported' case properly. >>> Yes, I will include changes to handle in both userspace and kernel. >> I am seeking additional suggestions on handling "non-supported" cases before making the >> changes. >> >> Userspace: To enable DSP side polling, a remote call is made as defined in the DSP image. >> If this call fails, polling mode will not be enabled from userspace. > No. Instead userspace should check with the kernel, which capabilities > are supported. Don't perform API calls which knowingly can fail. > >> Kernel: Since this is a DSP-specific feature, I plan to add a devicetree property, such >> as "qcom,polling-supported," under the fastrpc node if the DSP supports polling mode. > This doesn't sound like a logical solution. The kernel already knows > the hardware that it is running on. As such, there should be no need > to further describe the hardware in DT. If the DSP firmware can report > its capabilities, use that. If not, extend the schema to add an > SoC-specific compatibility string. As a last resort we can use > of_machine_is_compatible(). Thanks for your suggestions. I'll try these out. --Ekansh > >> Does this approach seem appropriate, or is there a better way to handle this? >> >> Thanks, >> Ekansh >> >>>>> For now, with in-driver >>>>> polling, we are seeing significant performance improvements for calls >>>>> with different sized buffers. On polling supporting platform, I've observed an >>>>> ~80us improvement in latency. You can find more details in the test >>>>> results here: >>>>> https://github.com/quic/fastrpc/pull/134/files#diff-7dbc6537cd3ade7fea5766229cf585db585704e02730efd72e7afc9b148e28ed >>>> Does the improvement come from the CPU not goint to idle or from the >>>> glink response processing? >>> Although both are contributing to performance improvement, the major >>> improvement is coming from CPU not going to idle state. >>> >>> Thanks, >>> Ekansh >>> >>>>> Regarding your concerns about power consumption, while in-driver polling >>>>> eliminates CPU wakeup and scheduling, it does increase power consumption. >>>>> However, the performance gains seem to outweigh this increase. >>>>> >>>>> Do you think the poll implementation that you suggested above could provide similar >>>>> improvements? >>>> No, I agree here. I was more concentrated on userspace polling rather >>>> than hw polling. >>>> >