On 13/04/18 10:21, Shawn Lin wrote: > On 2018/4/13 15:03, Adrian Hunter wrote: >> On 13/04/18 03:53, Shawn Lin wrote: >>> On 2018/4/12 17:19, Adrian Hunter wrote: >>>> On 12/04/18 11:53, Shawn Lin wrote: >>>>> On 2018/4/12 15:03, Adrian Hunter wrote: >>>>>> On 12/04/18 03:43, Shawn Lin wrote: >>>>>>> On 2018/4/11 18:26, Adrian Hunter wrote: >>>>>>>> On 11/04/18 12:13, Shawn Lin wrote: >>>>>>>>> On 2018/4/11 16:46, Adrian Hunter wrote: >>>>>>>>>> On 11/04/18 10:24, Shawn Lin wrote: >>>>>>>>>>> A simply continueous background I/O could 100% make the system >>>>>>>>>>> stuck for >>>>>>>>>>> a long time in my system when a unbind for the controller driver >>>>>>>>>>> happens >>>>>>>>>>> simultaneously. See: >>>>>>>>>>> >>>>>>>>>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 && >>>>>>>>>>> echo fe320000.dwmmc > >>>>>>>>>>> /sys/bus/platform/drivers/dwmmc_rockchip/unbind >>>>>>>>>>> >>>>>>>>>>> The reason is all pending requests wait for timeout one by one, but >>>>>>>>>>> never >>>>>>>>>>> propagates BLK_STS_IOERR in the first place when kicked from the >>>>>>>>>>> queue. >>>>>>>>>>> Set the card as removed immediately in mmc_remove_card() to solve >>>>>>>>>>> it. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Shawn Lin <shawn.lin@xxxxxxxxxxxxxx> >>>>>>>>>>> --- >>>>>>>>>>> >>>>>>>>>>> drivers/mmc/core/bus.c | 1 + >>>>>>>>>>> 1 file changed, 1 insertion(+) >>>>>>>>>>> >>>>>>>>>>> diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c >>>>>>>>>>> index fc92c6c..c6c5dfe 100644 >>>>>>>>>>> --- a/drivers/mmc/core/bus.c >>>>>>>>>>> +++ b/drivers/mmc/core/bus.c >>>>>>>>>>> @@ -389,6 +389,7 @@ void mmc_remove_card(struct mmc_card *card) >>>>>>>>>>> pr_info("%s: card %04x removed\n", >>>>>>>>>>> mmc_hostname(card->host), card->rca); >>>>>>>>>>> } >>>>>>>>>>> + mmc_card_set_removed(card); >>>>>>>>>> >>>>>>>>>> Pedantically we should not call mmc_card_set_removed() if we have not >>>>>>>>>> claimed the host. Of course we can't claim the host because the >>>>>>>>>> block >>>>>>>>> >>>>>>>>> yep. >>>>>>>>> >>>>>>>>>> driver already has it, but I am not sure this is the right place >>>>>>>>>> to do >>>>>>>>>> this. >>>>>>>>>> My first question is how come the I/O times out if the card is >>>>>>>>>> still >>>>>>>>>> present i.e. you are only unbinding the host controller, so you >>>>>>>>>> should >>>>>>>>>> remove the card while the host controller and card are still >>>>>>>>>> functional? >>>>>>>>> >>>>>>>>> The card is still functional for my host(but maybe not if ->remove() >>>>>>>>> touchs the vmmc or vqmmc parts), but the host controller isn't, as the >>>>>>>>> ->remove() calls. >>>>>>>> >>>>>>>> I don't understand why the host controller isn't functional. I >>>>>>>> tried it >>>>>>> >>>>>>> I guess we have misunderstanding about what the "functional" means. What >>>>>>> I want to say is when unbinding the host driver, mmc_remove_host() is >>>>>>> called, so the target host isn't present from the view of core layer, >>>>>>> right? >>>>>>> >>>>>>>> with sdhci and it just waits for dd to finish but there are no I/O >>>>>>>> errors. >>>>>>>> >>>>>>>> Is there a use-case for unbinding quickly? >>>>>>> >>>>>>> Oh, I think I made a mistake in commit msg, the repro steps should be >>>>>>> (1) run dd in background: >>>>>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 & >>>>>>> (2) unbind you driver, for instance: >>>>>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind >>>>>>> >>>>>>> Then yes, there is no I/O errors and it waits for dd to finish. But >>>>>>> when waiting for dd to finish, console got stuck, and can't respond to >>>>>>> any ctrl+{c,d,z}, even sysrq. The time depends how much I/O it submits, >>>>>>> the above setp(1) make it stuck for 2 miniutes for my system. >>>>>> >>>>>> Job control on a console might not always work, especially with busybox. >>>>>> I guess if you run the unbind in the background, then there is not a >>>>>> problem. >>>>> >>>>> The dd case is just a quick reproduce but the real case is the >>>>> Android system has many vold(similar to udev) driven partition scan >>>>> tools/applications that fire I/Os once sd card mounted. The applications >>>>> is busy waiting there if we check background job by "top", which make >>>>> the user experience very bad in this way. >>>> >>>> Is this related to unbinding the host controller? >>>> >>>>> >>>>>> >>>>>> Given that unbinding the host controller while it is still in use is a >>>>>> bad >>>>>> thing, why do we care? >>>>> >>>>> I just checked the log and wonder why mmc_mq_queue_rq still allows >>>>> request to dispatch? Maybe it's a bad thing to unbind the controller >>>>> when it's used, but that's not forbade, at least it should behave >>>>> the same with real physically hot-plug. >>>> >>>> Should it? From memory I vaguely recall seeing EXT4 doing a sync before >>>> its >>>> partition gets yanked from beneath it. If that is right it would tend to >>>> indicate that there is not an expectation that I/O should fail immediately. >>> >>> I have little knowledge about EXT4 :),but that happend for cards with >>> different fs format. >>> >>>> >>>>> Another reason I do care about this is because I usually can't access to >>>>> my test devices, but only remotely do job on their console. So if I need >>>>> to reset/hot-plug the card, I usually do unbind/bind, but the partition >>>>> scan applications + unbind make the console stuck for a longer time >>>>> than expected which is very painful for me to wait for finishment. :( >>>> >>>> Why not do the unbind in the background and then kill the partition scan? >>> >>> Yes, I could :). But the conversation is deviating from core issue that >>> why the MMC_CARD_REMOVED flag is left apart for the card but the card >>> is removed. That being said, don't you think "mmc1: card 1234 removed" >>> means mmc_card_removed(card) should be true? >> >> No. It means the message is misleading for the unbind case. > > Hmm.. I don't think so. The unbind case means the controller/slot has > gone, so system can never access to the card any more. I admit it's > physcially present, but not logically. So it looks same to me that > I have a card in slot but the kernel lacks driver to support the card. No we always remove children (card) before the parent (host controller). > >> >> If you want a way to mark the card as removed, couldn't you add something to >> debugfs. > > It's irrelevant, as no in-kernel scenario check the debugfs. They only I am not sure that is true. For a removable card, setting the card as removed and scheduling a rescan would probably remove it. > care about status propagated from MMC block layer. If the card could > never be access any more, you shouldn't still allow requests pass > through the first check of mmc_card_removed in mmc_mq_queue_rq. Do you > agree? :) No, I think we have established that the card is fully functional. -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html