Re: [PATCH V12 1/8] block, bfq: split sync bfq_queues on a per-actuator basis

Paolo Valente <paolo.valente@xxxxxxxxxx> · Thu, 29 Dec 2022 21:35:24 +0100

> Il giorno 27 dic 2022, alle ore 02:37, Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> ha scritto:
> 
> On 12/23/22 00:21, Paolo Valente wrote:
>> Single-LUN multi-actuator SCSI drives, as well as all multi-actuator
>> SATA drives appear as a single device to the I/O subsystem [1].  Yet
>> they address commands to different actuators internally, as a function
>> of Logical Block Addressing (LBAs). A given sector is reachable by
>> only one of the actuators. For example, Seagate’s Serial Advanced
>> Technology Attachment (SATA) version contains two actuators and maps
>> the lower half of the SATA LBA space to the lower actuator and the
>> upper half to the upper actuator.
>> 
>> Evidently, to fully utilize actuators, no actuator must be left idle
>> or underutilized while there is pending I/O for it. The block layer
>> must somehow control the load of each actuator individually. This
>> commit lays the ground for allowing BFQ to provide such a per-actuator
>> control.
>> 
>> BFQ associates an I/O-request sync bfq_queue with each process doing
>> synchronous I/O, or with a group of processes, in case of queue
>> merging. Then BFQ serves one bfq_queue at a time. While in service, a
>> bfq_queue is emptied in request-position order. Yet the same process,
>> or group of processes, may generate I/O for different actuators. In
>> this case, different streams of I/O (each for a different actuator)
>> get all inserted into the same sync bfq_queue. So there is basically
>> no individual control on when each stream is served, i.e., on when the
>> I/O requests of the stream are picked from the bfq_queue and
>> dispatched to the drive.
>> 
>> This commit enables BFQ to control the service of each actuator
>> individually for synchronous I/O, by simply splitting each sync
>> bfq_queue into N queues, one for each actuator. In other words, a sync
>> bfq_queue is now associated to a pair (process, actuator). As a
>> consequence of this split, the per-queue proportional-share policy
>> implemented by BFQ will guarantee that the sync I/O generated for each
>> actuator, by each process, receives its fair share of service.
>> 
>> This is just a preparatory patch. If the I/O of the same process
>> happens to be sent to different queues, then each of these queues may
>> undergo queue merging. To handle this event, the bfq_io_cq data
>> structure must be properly extended. In addition, stable merging must
>> be disabled to avoid loss of control on individual actuators. Finally,
>> also async queues must be split. These issues are described in detail
>> and addressed in next commits. As for this commit, although multiple
>> per-process bfq_queues are provided, the I/O of each process or group
>> of processes is still sent to only one queue, regardless of the
>> actuator the I/O is for. The forwarding to distinct bfq_queues will be
>> enabled after addressing the above issues.
>> 
>> [1] https://www.linaro.org/blog/budget-fair-queueing-bfq-linux-io-scheduler-optimizations-for-multi-actuator-sata-hard-drives/
>> 
>> Signed-off-by: Gabriele Felici <felicigb@xxxxxxxxx>
>> Signed-off-by: Carmine Zaccagnino <carmine@xxxxxxxxxxxxxxx>
>> Signed-off-by: Paolo Valente <paolo.valente@xxxxxxxxxx>
> 
> One styles nit below.
> 
> Reviewed-by: Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx>
> 
>> @@ -690,14 +700,25 @@ static void bfq_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
>> 		limit = (limit * depth) >> bfqd->full_depth_shift;
>> 	}
>> 
>> -	/*
>> -	 * Does queue (or any parent entity) exceed number of requests that
>> -	 * should be available to it? Heavily limit depth so that it cannot
>> -	 * consume more available requests and thus starve other entities.
>> -	 */
>> -	if (bfqq && bfqq_request_over_limit(bfqq, limit))
>> -		depth = 1;
>> +	for (act_idx = 0; act_idx < bfqd->num_actuators; act_idx++) {
>> +		struct bfq_queue *bfqq;
>> +
>> +		if (bic)
>> +			bfqq = bic_to_bfqq(bic, op_is_sync(opf), act_idx);
>> +		else
>> +			break;
>> 
>> +		/*
>> +		 * Does queue (or any parent entity) exceed number of
>> +		 * requests that should be available to it? Heavily
>> +		 * limit depth so that it cannot consume more
>> +		 * available requests and thus starve other entities.
>> +		 */
>> +		if (bfqq && bfqq_request_over_limit(bfqq, limit)) {
>> +			depth = 1;
>> +			break;
>> +		}
> 
> You could reverse the if condition to make this cleaner, or even better,
> include the bic test in the for loop:
> 
> for (act_idx = 0; bic && act_idx < bfqd->num_actuators; act_idx++) {
> 	struct bfq_queue *bfqq;
> 
> 	/*
> 	 * Does queue (or any parent entity) exceed number of
> 	 * requests that should be available to it? Heavily
> 	 * limit depth so that it cannot consume more
> 	 * available requests and thus starve other entities.
> 	 */
> 	bfqq = bic_to_bfqq(bic, op_is_sync(opf), act_idx);
> 	if (bfqq && bfqq_request_over_limit(bfqq, limit)) {
> 		depth = 1;
> 		break;
> 	}
> }
> 

Done, thanks for this improvement.

Sending a V13, with all patches tagged as reviewed.

Thanks,
Paolo

> 
> -- 
> Damien Le Moal
> Western Digital Research