Re: [PATCH] mmc: core: Set card as removed in mmc_remove_card()

Adrian Hunter <adrian.hunter@xxxxxxxxx> · Thu, 12 Apr 2018 12:19:25 +0300

On 12/04/18 11:53, Shawn Lin wrote:
> On 2018/4/12 15:03, Adrian Hunter wrote:
>> On 12/04/18 03:43, Shawn Lin wrote:
>>> On 2018/4/11 18:26, Adrian Hunter wrote:
>>>> On 11/04/18 12:13, Shawn Lin wrote:
>>>>> On 2018/4/11 16:46, Adrian Hunter wrote:
>>>>>> On 11/04/18 10:24, Shawn Lin wrote:
>>>>>>> A simply continueous background I/O could 100% make the system stuck for
>>>>>>> a long time in my system when a unbind for the controller driver happens
>>>>>>> simultaneously. See:
>>>>>>>
>>>>>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &&
>>>>>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
>>>>>>>
>>>>>>> The reason is all pending requests wait for timeout one by one, but
>>>>>>> never
>>>>>>> propagates BLK_STS_IOERR in the first place when kicked from the queue.
>>>>>>> Set the card as removed immediately in mmc_remove_card() to solve it.
>>>>>>>
>>>>>>> Signed-off-by: Shawn Lin <shawn.lin@xxxxxxxxxxxxxx>
>>>>>>> ---
>>>>>>>
>>>>>>>     drivers/mmc/core/bus.c | 1 +
>>>>>>>     1 file changed, 1 insertion(+)
>>>>>>>
>>>>>>> diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
>>>>>>> index fc92c6c..c6c5dfe 100644
>>>>>>> --- a/drivers/mmc/core/bus.c
>>>>>>> +++ b/drivers/mmc/core/bus.c
>>>>>>> @@ -389,6 +389,7 @@ void mmc_remove_card(struct mmc_card *card)
>>>>>>>                 pr_info("%s: card %04x removed\n",
>>>>>>>                     mmc_hostname(card->host), card->rca);
>>>>>>>             }
>>>>>>> +        mmc_card_set_removed(card);
>>>>>>
>>>>>> Pedantically we should not call mmc_card_set_removed() if we have not
>>>>>> claimed the host.  Of course we can't claim the host because the block
>>>>>
>>>>> yep.
>>>>>
>>>>>> driver already has it, but I am not sure this is the right place to do
>>>>>> this.
>>>>>>     My first question is how come the I/O times out if the card is still
>>>>>> present i.e. you are only unbinding the host controller, so you should
>>>>>> remove the card while the host controller and card are still functional?
>>>>>
>>>>> The card is still functional for my host(but maybe not if ->remove()
>>>>> touchs the vmmc or vqmmc parts), but the host controller isn't, as the
>>>>> ->remove() calls.
>>>>
>>>> I don't understand why the host controller isn't functional.  I tried it
>>>
>>> I guess we have misunderstanding about what the "functional" means. What
>>> I want to say is when unbinding the host driver, mmc_remove_host() is
>>> called, so the target host isn't present from the view of core layer,
>>> right?
>>>
>>>> with sdhci and it just waits for dd to finish but there are no I/O errors.
>>>>
>>>> Is there a use-case for unbinding quickly?
>>>
>>> Oh, I think I made a mistake in commit msg, the repro steps should be
>>> (1) run dd in background:
>>> dd if=/dev/mmcblk0 of=/dev/null bs=512k count=100000 &
>>> (2) unbind you driver, for instance:
>>> echo fe320000.dwmmc > /sys/bus/platform/drivers/dwmmc_rockchip/unbind
>>>
>>> Then yes, there is no I/O errors and it waits for dd to finish. But
>>> when waiting for dd to finish, console got stuck, and can't respond to
>>> any ctrl+{c,d,z}, even sysrq. The time depends how much I/O it submits,
>>> the above setp(1) make it stuck for 2 miniutes for my system.
>>
>> Job control on a console might not always work, especially with busybox.
>> I guess if you run the unbind in the background, then there is not a problem.
> 
> The dd case is just a quick reproduce but the real case is the
> Android system has many vold(similar to udev) driven partition scan
> tools/applications that fire I/Os once sd card mounted. The applications
> is busy waiting there if we check background job by "top", which make
> the user experience very bad in this way.

Is this related to unbinding the host controller?

> 
>>
>> Given that unbinding the host controller while it is still in use is a bad
>> thing, why do we care?
> 
> I just checked the log and wonder why mmc_mq_queue_rq still allows
> request to dispatch?  Maybe it's a bad thing to unbind the controller
> when it's used, but that's not forbade, at least it should behave
> the same with real physically hot-plug.

Should it?  From memory I vaguely recall seeing EXT4 doing a sync before its
partition gets yanked from beneath it.  If that is right it would tend to
indicate that there is not an expectation that I/O should fail immediately.

> Another reason I do care about this is because I usually can't access to
> my test devices, but only remotely do job on their console. So if I need
> to reset/hot-plug the card, I usually do unbind/bind, but the partition
> scan applications + unbind make the console stuck for a longer time
> than expected which is very painful for me to wait for finishment. :(

Why not do the unbind in the background and then kill the partition scan?
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html