On Tue, May 13, 2014, Seungwon Jeon wrote: > Hi Doug, > > On Tue, May 13, 2014, Doug Anderson wrote: > > Seungwon, > > > > On Sat, May 10, 2014 at 7:11 AM, Seungwon Jeon <tgih.jun@xxxxxxxxxxx> wrote: > > > On Fri, May 09, 2014, Sonny Rao wrote: > > >> On Thu, May 8, 2014 at 2:42 AM, Yuvaraj Kumar <yuvaraj.cd@xxxxxxxxx> wrote: > > >> > Any comments on this patch? > > >> > > > >> > > >> I'll just add that without this fix, running the tuning loop for UHS > > >> modes is not reliable on dw_mmc because errors will happen and you > > >> will eventually hit this race and hang. This can happen any time > > >> there is tuning like during boot or during resume from suspend. > > >> > > >> > On Thu, Mar 27, 2014 at 11:48 AM, Yuvaraj Kumar C D > > >> > <yuvaraj.cd@xxxxxxxxx> wrote: > > >> >> From: Doug Anderson <dianders@xxxxxxxxxxxx> > > >> >> > > >> >> If we happened to get a data error at just the wrong time the dw_mmc > > >> >> driver could get into a state where it would never complete its > > >> >> request. That would leave the caller just hanging there. > > >> >> > > >> >> We fix this two ways and both of the two fixes on their own appear to > > >> >> fix the problems we've seen: > > >> >> > > >> >> 1. Fix a race in the tasklet where the interrupt setting the data > > >> >> error happens _just after_ we check for it, then we get a > > >> >> EVENT_XFER_COMPLETE. We fix this by repeating a bit of code. > > > I think repeating is not good approach to fix race. > > > In your case, XFER_COMPLETE preceded data error and DTO didn't come? > > > It seems strange case. > > > I want to know actual error value if you can reproduce. > > > > XFER_COMPLETE didn't necessarily precede data error. Imagine this scenario: > > > > 1. Check for data error: nope > > 2. Interrupt happens and we get a data error and immediately xfer complete > > 3. Check for xfer complete: yup > > > > That's the state that we are handling. > > > > The system that dw_mmc uses where the interrupt handler has no locking > > makes it incredibly difficult to get things right. Can you propose an > > alternate fix that would avoid the race? > Thank you for detailed scenario. > You're right. > Have you consider using spin_lock() in interrupt handler? > Then, we'll need to change spin_lock() to spin_lock_irqsave() in tasklet func. > And other locks in driver may need to be adjusted properly. > > To return above scenario: > 1. Check for data error: nope > 2. Check for xfer complete: nope -> escape tasklet. > 3. Interrupt happens and we get a data error and immediately xfer complete > 4. Check for data error (Again in tasklet) : yup > > How about this change? > > Thanks, > Seungwon Jeon > > > > > > >> >> 2. Fix it so that if we detect that we've got an error in the "data > > >> >> busy" state and we're not going to do anything else we end the > > >> >> request and unblock anyone waiting. > > >> >> > > >> >> Signed-off-by: Doug Anderson <dianders@xxxxxxxxxxxx> > > >> >> Signed-off-by: Yuvaraj Kumar C D <yuvaraj.cd@xxxxxxxxx> > > >> >> --- > > >> >> drivers/mmc/host/dw_mmc.c | 47 +++++++++++++++++++++++++++++++++++++++++++++ > > >> >> 1 file changed, 47 insertions(+) > > >> >> > > >> >> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c > > >> >> index 1d77431..4c589f1 100644 > > >> >> --- a/drivers/mmc/host/dw_mmc.c > > >> >> +++ b/drivers/mmc/host/dw_mmc.c > > >> >> @@ -1300,6 +1300,14 @@ static void dw_mci_tasklet_func(unsigned long priv) > > >> >> /* fall through */ > > >> >> > > >> >> case STATE_SENDING_DATA: > > >> >> + /* > > >> >> + * We could get a data error and never a transfer > > >> >> + * complete so we'd better check for it here. > > >> >> + * > > >> >> + * Note that we don't really care if we also got a > > >> >> + * transfer complete; stopping the DMA and sending an > > >> >> + * abort won't hurt. > > >> >> + */ > > >> >> if (test_and_clear_bit(EVENT_DATA_ERROR, > > >> >> &host->pending_events)) { > > >> >> dw_mci_stop_dma(host); > > >> >> @@ -1313,7 +1321,29 @@ static void dw_mci_tasklet_func(unsigned long priv) > > >> >> break; > > >> >> > > >> >> set_bit(EVENT_XFER_COMPLETE, &host->completed_events); > > >> >> + > > >> >> + /* > > >> >> + * Handle an EVENT_DATA_ERROR that might have shown up > > >> >> + * before the transfer completed. This might not have > > >> >> + * been caught by the check above because the interrupt > > >> >> + * could have gone off between the previous check and > > >> >> + * the check for transfer complete. > > >> >> + * > > >> >> + * Technically this ought not be needed assuming we > > >> >> + * get a DATA_COMPLETE eventually (we'll notice the > > >> >> + * error and end the request), but it shouldn't hurt. > > >> >> + * > > >> >> + * This has the advantage of sending the stop command. > > >> >> + */ > > >> >> + if (test_and_clear_bit(EVENT_DATA_ERROR, > > >> >> + &host->pending_events)) { > > >> >> + dw_mci_stop_dma(host); > > >> >> + send_stop_abort(host, data); > > >> >> + state = STATE_DATA_ERROR; > > >> >> + break; > > >> >> + } > > >> >> prev_state = state = STATE_DATA_BUSY; > > >> >> + > > >> >> /* fall through */ > > >> >> > > >> >> case STATE_DATA_BUSY: > > >> >> @@ -1336,6 +1366,23 @@ static void dw_mci_tasklet_func(unsigned long priv) > > >> >> /* stop command for open-ended transfer*/ > > >> >> if (data->stop) > > >> >> send_stop_abort(host, data); > > >> >> + } else { > > >> >> + /* > > >> >> + * If we don't have a command complete now we'll > > >> >> + * never get one since we just reset everything; > > >> >> + * better end the request. > > >> >> + * > > >> >> + * If we do have a command complete we'll fall > > >> >> + * through to the SENDING_STOP command and > > >> >> + * everything will be peachy keen. > > >> >> + * > > >> >> + * TODO: I guess we shouldn't send a stop? Please remove TODO: We already reset controller in dw_mci_data_complete() through "mmc: dw_mmc: change to use recommended reset procedure"? I guess it depends on that patch. Then, we don't need to stop sequence anymore. Thanks, Seungwon Jeon > > >> >> + */ > > >> >> + if (!test_bit(EVENT_CMD_COMPLETE, > > >> >> + &host->pending_events)) { > > >> >> + dw_mci_request_end(host, mrq); > > >> >> + goto unlock; > > >> >> + } > > > Can you explain what happens above? > > > What is it for? > > > > This was an alternate fix for the above, but appears to actually hit > > in practice too. > > > > Said another way: if we don't add the extra checking for > > EVENT_DATA_ERROR (above) we'll end up here. ...and if we ever get > > into this "else" and don't do _something_ then we'll wedge forever. > > > > -Doug > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-mmc" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-mmc" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html