Re: [PATCH RFC] scsi: ufs-core: Do not use clk_scaling_lock in ufshcd_queuecommand()

"Asutosh Das (asd)" <asutoshd@xxxxxxxxxxxxxx> · Mon, 8 Nov 2021 10:06:14 -0800

On 11/4/2021 10:10 AM, Bart Van Assche wrote:
On 11/4/21 7:23 AM, Asutosh Das (asd) wrote:
In the current clock scaling code, the expectation is to scale up as 
soon as possible.

For e.g. say, the current gear is G1 and there're pending requests in 
the queue but the DBR is empty and there's a decision to scale up. 
During scale-up, if the queues are frozen, wouldn't those requests be 
issued to the driver and executed in G1 instead of G4?
I think this would lead to higher run to run variance in performance.

Hi Asutosh,

My understanding of the current clock scaling implementation is as 
follows (please correct me if I got anything wrong):
* ufshcd_clock_scaling_prepare() is called before clock scaling happens
   and ufshcd_clock_scaling_unprepare() is called after clock scaling has
   finished.
* ufshcd_clock_scaling_prepare() calls ufshcd_scsi_block_requests() and
   ufshcd_wait_for_doorbell_clr().
* ufshcd_wait_for_doorbell_clr() waits until both the UTP Transfer
   Request List Doorbell Register and UTP Task Management Request List
   DoorBell Register are zero. Hence, it waits until all pending SCSI
   commands, task management commands and device commands have finished.

As far as I can see from a conceptual viewpoint there is no difference
between calling ufshcd_wait_for_doorbell_clr() or freezing the request
queues. There is an implementation difference however: 
blk_mq_freeze_queue() waits for an RCU grace period. This can introduce
an additional delay of several milliseconds compared to 
ufshcd_wait_for_doorbell_clr(). If this is a concern I can look into 
expediting the RCU grace period during clock scaling.

Thanks,

Bart.
Hi Bart,
The current scaling code invokes scsi_block_request() which would block 
incoming requests right away in prepare. So any un-issued requests 
wouldn't be issued.

I'm not sure if this exact behavior would manifest if 
blk_freeze_queue_start() is used and scsi_block_request() is removed.
I see that it invokes - blk_mq_run_hw_queues().
void blk_freeze_queue_start(struct request_queue *q)
{
        mutex_lock(&q->mq_freeze_lock);
        if (++q->mq_freeze_depth == 1) {
                percpu_ref_kill(&q->q_usage_counter);
                mutex_unlock(&q->mq_freeze_lock);
                if (queue_is_mq(q))
                        blk_mq_run_hw_queues(q, false);
        } else {
                mutex_unlock(&q->mq_freeze_lock);
        }
}

Would blk_mq_run_hw_queues() issue any pending requests in the queue? If 
so, then these requests would be completed in the unchanged power-mode 
which is not what we want.

-asd

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
Linux Foundation Collaborative Project