On Wed, Aug 14, 2019 at 12:32:44PM +0200, Martijn Coenen wrote: > Since Android Q, the creation and configuration of loop devices is in > the critical path of device boot. We found that the configuration of > loop devices is pretty slow, because many ioctl()'s involve freezing the > block queue, which in turn needs to wait for an RCU grace period. On > Android devices we've observed up to 60ms for the creation and > configuration of a single loop device; as we anticipate creating many > more in the future, we'd like to avoid this delay. > Another candidate is to not switch to q_usage_counter's percpu mode until loop becomes Lo_bound, and this way may be more clean. Something like the following patch: diff --git a/drivers/block/loop.c b/drivers/block/loop.c index a7461f482467..8791f9242583 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1015,6 +1015,9 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, */ bdgrab(bdev); mutex_unlock(&loop_ctl_mutex); + + percpu_ref_switch_to_percpu(&lo->lo_queue->q_usage_counter); + if (partscan) loop_reread_partitions(lo, bdev); if (claimed_bdev) @@ -1171,6 +1174,8 @@ static int __loop_clr_fd(struct loop_device *lo, bool release) lo->lo_state = Lo_unbound; mutex_unlock(&loop_ctl_mutex); + percpu_ref_switch_to_atomic(&lo->lo_queue->q_usage_counter, NULL); + /* * Need not hold loop_ctl_mutex to fput backing file. * Calling fput holding loop_ctl_mutex triggers a circular @@ -2003,6 +2008,12 @@ static int loop_add(struct loop_device **l, int i) } lo->lo_queue->queuedata = lo; + /* + * cheat block layer for not switching to q_usage_counter's + * percpu mode before loop becomes Lo_bound + */ + blk_queue_flag_set(QUEUE_FLAG_INIT_DONE, lo->lo_queue); + blk_queue_max_hw_sectors(lo->lo_queue, BLK_DEF_MAX_SECTORS); /* thanks, Ming