On Fri, Nov 22, 2019 at 10:50 AM Arnd Bergmann <arnd@xxxxxxxx> wrote: > I suppose to make the submission non-blocking, all operations that > currently block in the submission path may have to be changed first. > > For the case of a partition switch (same for retune), I suppose > something like this can be done: > > - in queue_rq() check whether a partition switch is needed. If not, > submit the current rq > - if a partition switch is needed, submit the partition switch cmd > instead, and return busy status > - when the completion arrives for the partition switch, call back into > blk_mq to have it call queue_rq again. > > Or possibly even (this might not be possible without signifcant > restructuring): > > - when preparing a request that would require a partition switch, > insert another meta-request to switch the partition ahead of it. > > I do realize that this is a significant departure from how it was done > in the past, but it seems cleaner that way to me. This partition business really need a proper overhaul. I outlined the work elsewhere but the problem is that the eMMC "partitions" such as boot partitions and the usecase-defined "general" partition (notice SD cards do not have this problem) are badly integrated with the Linux partition manager. Instead of mapping these partitions 1:1 to the Linux partitions they are separate block devices with their own block queue while still having a name that suggest they are just a partition of the device. Which they are. The only thing peculiar with them is that the firmware in the card are aware of them, I think the partitions that are not primary may trade update correctness for speed, such that e.g. boot partitions may have extra redundant pages in the device so that they never become corrupted. But card vendors would have to comment. This has peculiar side effects yielding weird user experiences such that dd if=/dev/mmcblk0 of=my-mmc-backup.img will actually NOT make a backup of the whole device, only the primary partition. This should be fixed. My preferred solution would be to just catenate the logical blocks for these partitions beyond those of the primary partition, stash these offsets away somewhere and when they are accessed, insert special partition switch commands into the block scheduler just like you said. Right now the MMC core is trying to coordinate the uses of different partitions by arbitrating different requests from typically 4 different block devices instead which isn't very good to say the least. Also each block device eats memory and it should really just be one block device. Yours, Linus Walleij