On Mon, 2025-01-27 at 14:34 -0800, Bart Van Assche wrote: > Energy efficiency is very important for battery-powered devices like > smartphones. In battery-powered devices, CPU cores and peripherals > support multiple power states. A lower power state is entered if no > work > is pending. Typically the more power that is saved, the more time it > takes to exit the power saving state. > > Switching to a lower power state if no work is pending works well for > CPU-intensive tasks but is not optimal for latency-sensitive tasks > like > block I/O with a low queue depth. If a CPU core transitions to a > lower > power state after each I/O has been submitted and has to be woken up > every time an I/O completes, this can increase I/O latency > significantly. The cpu_latency_qos_update_request(..., max_latency) > function can be used to specify a maximum wakeup latency and hence > can > be used to prevent a transition to a lower power state before an I/O > completes. However, cpu_latency_qos_update_request() is too expensive > to > be called from the I/O submission path for every request. > > In the UFS driver the cpu_latency_qos_update_request() is called from > the devfreq_dev_profile::target() callback. That callback checks the > hba->clk_scaling.active_reqs variable, a variable that tracks the > number > of outstanding commands. Updates of that variable are protected by a > spinlock and hence are a contention point. Having to maintain this or > a > similar infrastructure in every block driver is not ideal. > > A possible solution is to tie QoS updates to the runtime-power > management (RPM) mechanism. The block layer interacts as follows with > the RPM mechanism: > * pm_runtime_mark_last_busy(dev) is called by the block layer upon > request completion. This call updates dev->power.last_busy. The > RPM > mechanism uses this information to decide when to check whether a > block device can be suspended. > * pm_request_resume() is called by the block layer if a block device > has > been runtime suspended and needs to be resumed. > * If the RPM timer expires, the block driver .runtime_suspend() > callback > is invoked. The .runtime_suspend() callback is expected to call > blk_pre_runtime_suspend() and blk_post_runtime_suspend(). > blk_pre_runtime_suspend() checks whether q->q_usage_counter is > zero. > > It is not my goal to replace the iowait boost mechanism. This > mechanism > boosts the CPU frequency when a task that is in the iowait state > wakes > up after the I/O operation completes. > > The purpose of this session is to discuss the following: > * A solution that exists in the block layer instead of in block > drivers. > * A solution that does not cause contention between block layer > hardware > queues. > * A solution that does not measurable increase the number of CPU > cycles > per I/O. > * A solution that does not require users to provide I/O latency > estimates. > > See also: > * https://www.kernel.org/doc/Documentation/power/pm_qos_interface.txt > * Tero Kristo, [PATCHv2 0/2] blk-mq: add CPU latency limit control, > 2024-10-18 > ( > https://lore.kernel.org/linux-block/20241018075416.436916-1-tero.kristo@xxxxxxxxxxxxxxx > /). > * The cpu_latency_constraints definition in kernel/power/qos.c. > Sounds like really interesting problem and topic. Thank you for suggesting. :) Yeah, we need to do something with power consumption. Thanks, Slava.