On 12/04/18 11:53, Shawn Lin wrote: > On 2018/4/12 15:03, Adrian Hunter wrote: >> On 12/04/18 03:43, Shawn Lin wrote: >>> On 2018/4/11 18:26, Adrian Hunter wrote: >>>> On 11/04/18 12:13, Shawn Lin wrote: >>>>> On 2018/4/11 16:46, Adrian Hunter wrote: >>>>>> On 11/04/18 10:24, Shawn Lin wrote: >>>>>>> A simply continueous background I/O could 100% make the system stuck for >>>>>>> a long time in my system when a unbind for the controller driver happens >>>>>>> simultaneously. See: >>>>>>> >>>>>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 && >>>>>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind >>>>>>> >>>>>>> The reason is all pending requests wait for timeout one by one, but >>>>>>> never >>>>>>> propagates BLK_STS_IOERR in the first place when kicked from the queue. >>>>>>> Set the card as removed immediately in mmc_remove_card() to solve it. >>>>>>> >>>>>>> Signed-off-by: Shawn Lin <shawn.lin@xxxxxxxxxxxxxx> >>>>>>> --- >>>>>>> >>>>>>> drivers/mmc/core/bus.c | 1 + >>>>>>> 1 file changed, 1 insertion(+) >>>>>>> >>>>>>> diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c >>>>>>> index fc92c6c..c6c5dfe 100644 >>>>>>> --- a/drivers/mmc/core/bus.c >>>>>>> +++ b/drivers/mmc/core/bus.c >>>>>>> @@ -389,6 +389,7 @@ void mmc_remove_card(struct mmc_card *card) >>>>>>> pr_info("%s: card %04x removed\n", >>>>>>> mmc_hostname(card->host), card->rca); >>>>>>> } >>>>>>> + mmc_card_set_removed(card); >>>>>> >>>>>> Pedantically we should not call mmc_card_set_removed() if we have not >>>>>> claimed the host. Of course we can't claim the host because the block >>>>> >>>>> yep. >>>>> >>>>>> driver already has it, but I am not sure this is the right place to do >>>>>> this. >>>>>> My first question is how come the I/O times out if the card is still >>>>>> present i.e. you are only unbinding the host controller, so you should >>>>>> remove the card while the host controller and card are still functional? >>>>> >>>>> The card is still functional for my host(but maybe not if ->remove() >>>>> touchs the vmmc or vqmmc parts), but the host controller isn't, as the >>>>> ->remove() calls. >>>> >>>> I don't understand why the host controller isn't functional. I tried it >>> >>> I guess we have misunderstanding about what the "functional" means. What >>> I want to say is when unbinding the host driver, mmc_remove_host() is >>> called, so the target host isn't present from the view of core layer, >>> right? >>> >>>> with sdhci and it just waits for dd to finish but there are no I/O errors. >>>> >>>> Is there a use-case for unbinding quickly? >>> >>> Oh, I think I made a mistake in commit msg, the repro steps should be >>> (1) run dd in background: >>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 & >>> (2) unbind you driver, for instance: >>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind >>> >>> Then yes, there is no I/O errors and it waits for dd to finish. But >>> when waiting for dd to finish, console got stuck, and can't respond to >>> any ctrl+{c,d,z}, even sysrq. The time depends how much I/O it submits, >>> the above setp(1) make it stuck for 2 miniutes for my system. >> >> Job control on a console might not always work, especially with busybox. >> I guess if you run the unbind in the background, then there is not a problem. > > The dd case is just a quick reproduce but the real case is the > Android system has many vold(similar to udev) driven partition scan > tools/applications that fire I/Os once sd card mounted. The applications > is busy waiting there if we check background job by "top", which make > the user experience very bad in this way. Is this related to unbinding the host controller? > >> >> Given that unbinding the host controller while it is still in use is a bad >> thing, why do we care? > > I just checked the log and wonder why mmc_mq_queue_rq still allows > request to dispatch? Maybe it's a bad thing to unbind the controller > when it's used, but that's not forbade, at least it should behave > the same with real physically hot-plug. Should it? From memory I vaguely recall seeing EXT4 doing a sync before its partition gets yanked from beneath it. If that is right it would tend to indicate that there is not an expectation that I/O should fail immediately. > Another reason I do care about this is because I usually can't access to > my test devices, but only remotely do job on their console. So if I need > to reset/hot-plug the card, I usually do unbind/bind, but the partition > scan applications + unbind make the console stuck for a longer time > than expected which is very painful for me to wait for finishment. :( Why not do the unbind in the background and then kill the partition scan? -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html