Energy efficiency is very important for battery-powered devices like
smartphones. In battery-powered devices, CPU cores and peripherals
support multiple power states. A lower power state is entered if no work
is pending. Typically the more power that is saved, the more time it
takes to exit the power saving state.
Switching to a lower power state if no work is pending works well for
CPU-intensive tasks but is not optimal for latency-sensitive tasks like
block I/O with a low queue depth. If a CPU core transitions to a lower
power state after each I/O has been submitted and has to be woken up
every time an I/O completes, this can increase I/O latency
significantly. The cpu_latency_qos_update_request(..., max_latency)
function can be used to specify a maximum wakeup latency and hence can
be used to prevent a transition to a lower power state before an I/O
completes. However, cpu_latency_qos_update_request() is too expensive to
be called from the I/O submission path for every request.
In the UFS driver the cpu_latency_qos_update_request() is called from
the devfreq_dev_profile::target() callback. That callback checks the
hba->clk_scaling.active_reqs variable, a variable that tracks the number
of outstanding commands. Updates of that variable are protected by a
spinlock and hence are a contention point. Having to maintain this or a
similar infrastructure in every block driver is not ideal.
A possible solution is to tie QoS updates to the runtime-power
management (RPM) mechanism. The block layer interacts as follows with
the RPM mechanism:
* pm_runtime_mark_last_busy(dev) is called by the block layer upon
request completion. This call updates dev->power.last_busy. The RPM
mechanism uses this information to decide when to check whether a
block device can be suspended.
* pm_request_resume() is called by the block layer if a block device has
been runtime suspended and needs to be resumed.
* If the RPM timer expires, the block driver .runtime_suspend() callback
is invoked. The .runtime_suspend() callback is expected to call
blk_pre_runtime_suspend() and blk_post_runtime_suspend().
blk_pre_runtime_suspend() checks whether q->q_usage_counter is zero.
It is not my goal to replace the iowait boost mechanism. This mechanism
boosts the CPU frequency when a task that is in the iowait state wakes
up after the I/O operation completes.
The purpose of this session is to discuss the following:
* A solution that exists in the block layer instead of in block drivers.
* A solution that does not cause contention between block layer hardware
queues.
* A solution that does not measurable increase the number of CPU cycles
per I/O.
* A solution that does not require users to provide I/O latency
estimates.
See also:
* https://www.kernel.org/doc/Documentation/power/pm_qos_interface.txt
* Tero Kristo, [PATCHv2 0/2] blk-mq: add CPU latency limit control,
2024-10-18
(https://lore.kernel.org/linux-block/20241018075416.436916-1-tero.kristo@xxxxxxxxxxxxxxx/).
* The cpu_latency_constraints definition in kernel/power/qos.c.
Thanks,
Bart.