On 13/04/18 03:53, Shawn Lin wrote: > On 2018/4/12 17:19, Adrian Hunter wrote: >> On 12/04/18 11:53, Shawn Lin wrote: >>> On 2018/4/12 15:03, Adrian Hunter wrote: >>>> On 12/04/18 03:43, Shawn Lin wrote: >>>>> On 2018/4/11 18:26, Adrian Hunter wrote: >>>>>> On 11/04/18 12:13, Shawn Lin wrote: >>>>>>> On 2018/4/11 16:46, Adrian Hunter wrote: >>>>>>>> On 11/04/18 10:24, Shawn Lin wrote: >>>>>>>>> A simply continueous background I/O could 100% make the system >>>>>>>>> stuck for >>>>>>>>> a long time in my system when a unbind for the controller driver >>>>>>>>> happens >>>>>>>>> simultaneously. See: >>>>>>>>> >>>>>>>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 && >>>>>>>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind >>>>>>>>> >>>>>>>>> The reason is all pending requests wait for timeout one by one, but >>>>>>>>> never >>>>>>>>> propagates BLK_STS_IOERR in the first place when kicked from the >>>>>>>>> queue. >>>>>>>>> Set the card as removed immediately in mmc_remove_card() to solve it. >>>>>>>>> >>>>>>>>> Signed-off-by: Shawn Lin <shawn.lin@xxxxxxxxxxxxxx> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> drivers/mmc/core/bus.c | 1 + >>>>>>>>> 1 file changed, 1 insertion(+) >>>>>>>>> >>>>>>>>> diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c >>>>>>>>> index fc92c6c..c6c5dfe 100644 >>>>>>>>> --- a/drivers/mmc/core/bus.c >>>>>>>>> +++ b/drivers/mmc/core/bus.c >>>>>>>>> @@ -389,6 +389,7 @@ void mmc_remove_card(struct mmc_card *card) >>>>>>>>> pr_info("%s: card %04x removed\n", >>>>>>>>> mmc_hostname(card->host), card->rca); >>>>>>>>> } >>>>>>>>> + mmc_card_set_removed(card); >>>>>>>> >>>>>>>> Pedantically we should not call mmc_card_set_removed() if we have not >>>>>>>> claimed the host. Of course we can't claim the host because the block >>>>>>> >>>>>>> yep. >>>>>>> >>>>>>>> driver already has it, but I am not sure this is the right place to do >>>>>>>> this. >>>>>>>> My first question is how come the I/O times out if the card is >>>>>>>> still >>>>>>>> present i.e. you are only unbinding the host controller, so you should >>>>>>>> remove the card while the host controller and card are still >>>>>>>> functional? >>>>>>> >>>>>>> The card is still functional for my host(but maybe not if ->remove() >>>>>>> touchs the vmmc or vqmmc parts), but the host controller isn't, as the >>>>>>> ->remove() calls. >>>>>> >>>>>> I don't understand why the host controller isn't functional. I tried it >>>>> >>>>> I guess we have misunderstanding about what the "functional" means. What >>>>> I want to say is when unbinding the host driver, mmc_remove_host() is >>>>> called, so the target host isn't present from the view of core layer, >>>>> right? >>>>> >>>>>> with sdhci and it just waits for dd to finish but there are no I/O >>>>>> errors. >>>>>> >>>>>> Is there a use-case for unbinding quickly? >>>>> >>>>> Oh, I think I made a mistake in commit msg, the repro steps should be >>>>> (1) run dd in background: >>>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 & >>>>> (2) unbind you driver, for instance: >>>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind >>>>> >>>>> Then yes, there is no I/O errors and it waits for dd to finish. But >>>>> when waiting for dd to finish, console got stuck, and can't respond to >>>>> any ctrl+{c,d,z}, even sysrq. The time depends how much I/O it submits, >>>>> the above setp(1) make it stuck for 2 miniutes for my system. >>>> >>>> Job control on a console might not always work, especially with busybox. >>>> I guess if you run the unbind in the background, then there is not a >>>> problem. >>> >>> The dd case is just a quick reproduce but the real case is the >>> Android system has many vold(similar to udev) driven partition scan >>> tools/applications that fire I/Os once sd card mounted. The applications >>> is busy waiting there if we check background job by "top", which make >>> the user experience very bad in this way. >> >> Is this related to unbinding the host controller? >> >>> >>>> >>>> Given that unbinding the host controller while it is still in use is a bad >>>> thing, why do we care? >>> >>> I just checked the log and wonder why mmc_mq_queue_rq still allows >>> request to dispatch? Maybe it's a bad thing to unbind the controller >>> when it's used, but that's not forbade, at least it should behave >>> the same with real physically hot-plug. >> >> Should it? From memory I vaguely recall seeing EXT4 doing a sync before its >> partition gets yanked from beneath it. If that is right it would tend to >> indicate that there is not an expectation that I/O should fail immediately. > > I have little knowledge about EXT4 :),but that happend for cards with > different fs format. > >> >>> Another reason I do care about this is because I usually can't access to >>> my test devices, but only remotely do job on their console. So if I need >>> to reset/hot-plug the card, I usually do unbind/bind, but the partition >>> scan applications + unbind make the console stuck for a longer time >>> than expected which is very painful for me to wait for finishment. :( >> >> Why not do the unbind in the background and then kill the partition scan? > > Yes, I could :). But the conversation is deviating from core issue that > why the MMC_CARD_REMOVED flag is left apart for the card but the card > is removed. That being said, don't you think "mmc1: card 1234 removed" > means mmc_card_removed(card) should be true? No. It means the message is misleading for the unbind case. If you want a way to mark the card as removed, couldn't you add something to debugfs. -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html