Re: [LSF/MM/BPF Topic] Energy-Efficient I/O

slava@xxxxxxxxxxx · Tue, 28 Jan 2025 11:18:09 -0800

On Mon, 2025-01-27 at 14:34 -0800, Bart Van Assche wrote:
> Energy efficiency is very important for battery-powered devices like
> smartphones. In battery-powered devices, CPU cores and peripherals
> support multiple power states. A lower power state is entered if no
> work
> is pending. Typically the more power that is saved, the more time it
> takes to exit the power saving state.
> 
> Switching to a lower power state if no work is pending works well for
> CPU-intensive tasks but is not optimal for latency-sensitive tasks
> like
> block I/O with a low queue depth. If a CPU core transitions to a
> lower
> power state after each I/O has been submitted and has to be woken up
> every time an I/O completes, this can increase I/O latency
> significantly. The cpu_latency_qos_update_request(..., max_latency)
> function can be used to specify a maximum wakeup latency and hence
> can
> be used to prevent a transition to a lower power state before an I/O
> completes. However, cpu_latency_qos_update_request() is too expensive
> to
> be called from the I/O submission path for every request.
> 
> In the UFS driver the cpu_latency_qos_update_request() is called from
> the devfreq_dev_profile::target() callback. That callback checks the
> hba->clk_scaling.active_reqs variable, a variable that tracks the
> number
> of outstanding commands. Updates of that variable are protected by a
> spinlock and hence are a contention point. Having to maintain this or
> a
> similar infrastructure in every block driver is not ideal.
> 
> A possible solution is to tie QoS updates to the runtime-power
> management (RPM) mechanism. The block layer interacts as follows with
> the RPM mechanism:
> * pm_runtime_mark_last_busy(dev) is called by the block layer upon
>    request completion. This call updates dev->power.last_busy. The
> RPM
>    mechanism uses this information to decide when to check whether a
>    block device can be suspended.
> * pm_request_resume() is called by the block layer if a block device
> has
>    been runtime suspended and needs to be resumed.
> * If the RPM timer expires, the block driver .runtime_suspend()
> callback
>    is invoked. The .runtime_suspend() callback is expected to call
>    blk_pre_runtime_suspend() and blk_post_runtime_suspend().
>    blk_pre_runtime_suspend() checks whether q->q_usage_counter is
> zero.
> 
> It is not my goal to replace the iowait boost mechanism. This
> mechanism
> boosts the CPU frequency when a task that is in the iowait state
> wakes
> up after the I/O operation completes.
> 
> The purpose of this session is to discuss the following:
> * A solution that exists in the block layer instead of in block
> drivers.
> * A solution that does not cause contention between block layer
> hardware
>    queues.
> * A solution that does not measurable increase the number of CPU
> cycles
>    per I/O.
> * A solution that does not require users to provide I/O latency
>    estimates.
> 
> See also:
> * https://www.kernel.org/doc/Documentation/power/pm_qos_interface.txt
> * Tero Kristo, [PATCHv2 0/2] blk-mq: add CPU latency limit control,
>    2024-10-18 
> (
> https://lore.kernel.org/linux-block/20241018075416.436916-1-tero.kristo@xxxxxxxxxxxxxxx
> /).
> * The cpu_latency_constraints definition in kernel/power/qos.c.
> 

Sounds like really interesting problem and topic. Thank you for
suggesting. :) Yeah, we need to do something with power consumption.

Thanks,
Slava.