Adrian, > -----Original Message----- > From: Adrian Hunter [mailto:adrian.hunter@xxxxxxxxx] > Sent: Monday, September 20, 2010 1:24 PM > To: Ghorai, Sukumar > Cc: linux-mmc@xxxxxxxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; > Adrian Hunter > Subject: Re: [PATCH] mmc: failure of block read wait for long time > > On 14/09/10 08:15, ext Ghorai, Sukumar wrote: > > Adrian, > > > > [..snip..] > >>>>> [Ghorai] Adrian, > >>>>> Yes this works and reduced the retry by 1/4 (2048 to 512 times for > 1MB > >>>> data read) form the original code; > >>>>> Initially it was retrying for each page(512 bytes) after multi-block > >>>> read fail; but this solution is retying for each segment(2048 bytes); > >>>>> 1. Now say block layrer reading 1MB and failed for the 1st segment. > So > >>>> it will still retry for 1MB/2048-bytes, i.e. 512 times. > >>>>> 2. So do you think any good reason to retry again and again? > >>>> If you have 1MB that is not readable, it sounds like the card is > broken. > >>>> Why are so many reads failing? Has the card been removed? > >>>> > >>>> You might very rarely see ECC errors in a small number of sectors, > >>>> but more than that sounds like something else is broken. > >>> > >>> [Ghorai] yes, one example is we remove the card when reading data, > >> > >> Well, that is a different case. Once the card has gone, the block > driver > >> can (and will once the remove method is called) error out all I/O > >> requests without sending them to MMC. That doesn't happen until there > >> is a card detect interrupt and a resulting rescan. > > > > [Ghorai] here we are discussing two problem, > > 1. If IO failed how to stop retry; because of - > > a. internal card error > > b. issue in Filesystem, driver, or host controller issue > > c. or cards removed. > > > > 2. And 2nd how to sync block-layer IO, if card removed, > > a. case 1: when card removed interrupt support by the platform > > b. case 2: when card removed interrupt does not support by the > platform? > > > >> > >> A possible solution is to put a flag on mmc_card to indicate card_gone > >> that gets set as soon as the drivers card detect interrupt shows there > >> is no card (hoping that we are not getting any bouncing on card detect) > >> and then have mmc_wait_for_req() simple return -ENODEV immediately if > >> the card_gone flag is set. Finally, if the mmc block driver sees > >> a -ENODEV error, it should also check the card_gone flag (via a new > >> core function) and if the card is gone, do not retry - and perhaps > >> even error out the rest of the I/O request queue as well. > > > > [Ghorai] your idea address the 2.a case, but not 2.b, 1.a, 1.b > > The card removal case can be extended to use the bus ops detect method > when there is no card detect irq. I will send a RFC patch. > > With respect to 1.a: > - If the card has an internal error, then it is broken. The user > should remove the card and use a better one. I do not see how reducing > retry delays really helps the user very much. Arguably if the card > becomes unresponsive, the MMC core could provide a facility to > reinitialise the card, but that is yet another issue. > > With respect to 1.b: > - The file system cannot cause the block driver to have I/O errors. > - If there are errors in the driver they should be fixed. > - If there are hardware problems with the host controller, then > it is up to the host controller driver to deal with them e.g. > by resetting the controller. I don't see what this has to do with > the block driver. > > You leave out the important case of ECC errors. I am concerned about > this because of the possibility that it happens inside a file system > journal e.g. EXT4 journal. Perhaps the journal may be recovered if the > error only affects the last transaction, but perhaps not if it destroys > other transactions - which could happen if the approach you suggest > is taken. > [Ghorai] Thanks lot for your descriptive answer. 1. Can you answer this? 2.b. case 2: when card removed interrupt does not support by the platform? 2. Why block layer handling for inter-leave data? Can you give example diver who is returning interleave data? And how to tell application that buffer having interleave data? > > > > And the solution I was proposing to return the status of IO failure as > soon as possible to above layer; and handle the card removed interrupt > separately or any other issue in h/w or s/w or combination of both. Or > just think again when platform don't have the card remove interrupt. > > > > So my patch addresses the 1st part > > It is absolutely unacceptable to return I/O errors to the upper layers > for segments that do not have errors. > > > And for the 2nd part we can submit the patch anytime. > > > >> > >> I can suggest a patch if you want but I am on vacation next week so > >> it will have to wait a couple of weeks. > >> > >>> And moreover we should not give the interleave data to apps, as we > don't > >> have option to tell application for the valid data. > >>> > > [..snip..] > > http://comments.gmane.org/gmane.linux.kernel.mmc/2714 > > > >>> > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html