Hi, On Tuesday, December 20, 2016 03:01:48 PM Linus Walleij wrote: > HACK ALERT: DO NOT MERGE THIS! IT IS A FYI PATCH FOR DISCUSSION > ONLY. > > This hack switches the MMC/SD subsystem from using the legacy blk > layer to using blk-mq. It does this by registering one single > hardware queue, since MMC/SD has only one command pipe. I kill > off the worker thread altogether and let the MQ core logic fire > sleepable requests directly into the MMC core. > > We emulate the 2 elements deep pipeline by specifying queue depth > 2, which is an elaborate lie that makes the block layer issue > another request while a previous request is in transit. It't not > neat but it works. > > As the pipeline needs to be flushed by pushing in a NULL request > after the last block layer request I added a delayed work with a > timeout of zero. This will fire as soon as the block layer stops > pushing in requests: as long as there are new requests the MQ > block layer will just repeatedly cancel this pipeline flush work > and push new requests into the pipeline, but once the requests > stop coming the NULL request will be flushed into the pipeline. > > It's not pretty but it works... Look at the following performance > statistics: > > BEFORE this patch: > > time dd if=/dev/mmcblk0 of=/dev/null bs=1M count=1024 > 1024+0 records in > 1024+0 records out > 1073741824 bytes (1.0GB) copied, 45.145874 seconds, 22.7MB/s > real 0m 45.15s > user 0m 0.02s > sys 0m 7.51s > > mount /dev/mmcblk0p1 /mnt/ > cd /mnt/ > time find . > /dev/null > real 0m 3.70s > user 0m 0.29s > sys 0m 1.63s > > AFTER this patch: > > time dd if=/dev/mmcblk0 of=/dev/null bs=1M count=1024 > 1024+0 records in > 1024+0 records out > 1073741824 bytes (1.0GB) copied, 45.285431 seconds, 22.6MB/s > real 0m 45.29s > user 0m 0.02s > sys 0m 6.58s > > mount /dev/mmcblk0p1 /mnt/ > cd /mnt/ > time find . > /dev/null > real 0m 4.37s > user 0m 0.27s > sys 0m 1.65s > > The results are consistent. > > As you can see, for a straight dd-like task, we get more or less the > same nice parallelism as for the old framework. I have confirmed > through debugprints that indeed this is because the two-stage pipeline > is full at all times. > > However, for spurious reads in the find command, we already see a big > performance regression. > > This is because there are many small operations requireing a flush of > the pipeline, which used to happen immediately with the old block > layer interface code that used to pull a few NULL requests off the > queue and feed them into the pipeline immediately after the last > request, but happens after the delayed work is executed in this > new framework. The delayed work is never quick enough to terminate > all these small operations even if we schedule it immediately after > the last request. > > AFAICT the only way forward to provide proper performance with MQ > for MMC/SD is to get the requests to complete out-of-sync, i.e. when > the driver calls back to MMC/SD core to notify that a request is > complete, it should not notify any main thread with a completion > as is done right now, but instead directly call blk_end_request_all() > and only schedule some extra communication with the card if necessary > for example to handle an error condition. To do this I think that SCSI subsystem-like error handling should be added to MMC core (in my uber-ugly blk-mq patches I just commented most of error handling out and added direct request completion call). > This rework needs a bigger rewrite so we can get rid of the paradigm > of the block layer "driving" the requests throgh the pipeline. IMHO for now blk-mq should be added as an add-on to MMC core with leaving the old code in-place (again SCSI layer serves as inspiration for that). It should be easier to test and merge blk-mq support without causing regressions this way (i.e. blk-mq handling of schedulers is only now being worked on). You may also consider re-basing your work on Adrian's swcmdq patches (up to "mmc: core: Do not prepare a new request twice" patch) as they cleanup MMC core queuing code significantly (some of the patches have been merged upstream recently): http://git.infradead.org/users/ahunter/linux-sdhci.git/shortlog/refs/heads/swcmdq Best regards, -- Bartlomiej Zolnierkiewicz Samsung R&D Institute Poland Samsung Electronics -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html