> -----Original Message----- > From: Adrian Hunter [mailto:adrian.hunter@xxxxxxxxx] > Sent: Monday, September 20, 2010 6:39 PM > To: Ghorai, Sukumar > Cc: linux-mmc@xxxxxxxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; > Adrian Hunter > Subject: Re: [PATCH] mmc: failure of block read wait for long time > > On 20/09/10 15:37, ext Ghorai, Sukumar wrote: > > > > > >> -----Original Message----- > >> From: Adrian Hunter [mailto:adrian.hunter@xxxxxxxxx] > >> Sent: Monday, September 20, 2010 5:20 PM > >> To: Ghorai, Sukumar > >> Cc: linux-mmc@xxxxxxxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; > >> Adrian Hunter > >> Subject: Re: [PATCH] mmc: failure of block read wait for long time > >> > >> On 20/09/10 11:57, Ghorai, Sukumar wrote: > >>> Adrian, > >>> > >>>> -----Original Message----- > >>>> From: Adrian Hunter [mailto:adrian.hunter@xxxxxxxxx] > >>>> Sent: Monday, September 20, 2010 1:24 PM > >>>> To: Ghorai, Sukumar > >>>> Cc: linux-mmc@xxxxxxxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; > >>>> Adrian Hunter > >>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time > >>>> > >>>> On 14/09/10 08:15, ext Ghorai, Sukumar wrote: > >>>>> Adrian, > >>>>> > >>>>> [..snip..] > >>>>>>>>> [Ghorai] Adrian, > >>>>>>>>> Yes this works and reduced the retry by 1/4 (2048 to 512 times > for > >>>> 1MB > >>>>>>>> data read) form the original code; > >>>>>>>>> Initially it was retrying for each page(512 bytes) after multi- > >> block > >>>>>>>> read fail; but this solution is retying for each segment(2048 > >> bytes); > >>>>>>>>> 1. Now say block layrer reading 1MB and failed for the 1st > segment. > >>>> So > >>>>>>>> it will still retry for 1MB/2048-bytes, i.e. 512 times. > >>>>>>>>> 2. So do you think any good reason to retry again and again? > >>>>>>>> If you have 1MB that is not readable, it sounds like the card is > >>>> broken. > >>>>>>>> Why are so many reads failing? Has the card been removed? > >>>>>>>> > >>>>>>>> You might very rarely see ECC errors in a small number of sectors, > >>>>>>>> but more than that sounds like something else is broken. > >>>>>>> > >>>>>>> [Ghorai] yes, one example is we remove the card when reading data, > >>>>>> > >>>>>> Well, that is a different case. Once the card has gone, the block > >>>> driver > >>>>>> can (and will once the remove method is called) error out all I/O > >>>>>> requests without sending them to MMC. That doesn't happen until > >> there > >>>>>> is a card detect interrupt and a resulting rescan. > >>>>> > >>>>> [Ghorai] here we are discussing two problem, > >>>>> 1. If IO failed how to stop retry; because of - > >>>>> a. internal card error > >>>>> b. issue in Filesystem, driver, or host controller issue > >>>>> c. or cards removed. > >>>>> > >>>>> 2. And 2nd how to sync block-layer IO, if card removed, > >>>>> a. case 1: when card removed interrupt support by the platform > >>>>> b. case 2: when card removed interrupt does not support by the > >>>> platform? > >>>>> > >>>>>> > >>>>>> A possible solution is to put a flag on mmc_card to indicate > >> card_gone > >>>>>> that gets set as soon as the drivers card detect interrupt shows > >> there > >>>>>> is no card (hoping that we are not getting any bouncing on card > >> detect) > >>>>>> and then have mmc_wait_for_req() simple return -ENODEV immediately > if > >>>>>> the card_gone flag is set. Finally, if the mmc block driver sees > >>>>>> a -ENODEV error, it should also check the card_gone flag (via a new > >>>>>> core function) and if the card is gone, do not retry - and perhaps > >>>>>> even error out the rest of the I/O request queue as well. > >>>>> > >>>>> [Ghorai] your idea address the 2.a case, but not 2.b, 1.a, 1.b > >>>> > >>>> The card removal case can be extended to use the bus ops detect > method > >>>> when there is no card detect irq. I will send a RFC patch. > >>>> > >>>> With respect to 1.a: > >>>> - If the card has an internal error, then it is broken. The user > >>>> should remove the card and use a better one. I do not see how > >> reducing > >>>> retry delays really helps the user very much. Arguably if the > card > >>>> becomes unresponsive, the MMC core could provide a facility to > >>>> reinitialise the card, but that is yet another issue. > >>>> > >>>> With respect to 1.b: > >>>> - The file system cannot cause the block driver to have I/O > errors. > >>>> - If there are errors in the driver they should be fixed. > >>>> - If there are hardware problems with the host controller, then > >>>> it is up to the host controller driver to deal with them e.g. > >>>> by resetting the controller. I don't see what this has to do > with > >>>> the block driver. > >>>> > >>>> You leave out the important case of ECC errors. I am concerned about > >>>> this because of the possibility that it happens inside a file system > >>>> journal e.g. EXT4 journal. Perhaps the journal may be recovered if > the > >>>> error only affects the last transaction, but perhaps not if it > destroys > >>>> other transactions - which could happen if the approach you suggest > >>>> is taken. > >>>> > >>> [Ghorai] Thanks lot for your descriptive answer. > >>> 1. Can you answer this? 2.b. case 2: when card removed interrupt does > >> not support by the platform? > >> > >> As I wrote above: The card removal case can be extended to use the bus > ops > >> detect method when there is no card detect irq. I will send a RFC > patch. > >> > >>> > >>> 2. Why block layer handling for inter-leave data? Can you give example > >> diver who is returning interleave data? And how to tell application > that > >> buffer having interleave data? > >> > >> I am not sure what you mean by interleave data, but file systems for > >> example > >> are free to map any block to any file, directory or file system object, > >> so a consecutive series of sectors may contain unrelated data. Up to a > >> maximum > >> size, the block layer merges I/O requests when the sectors are > consecutive, > >> so an I/O request can also contain unrelated data. > > > > [Ghorai] > > 1. I don't think so, FS know where data exists and where is the free > space. Except oth cluster. > > I was not talking about free space. I was giving an example > of why it is not possible to assume anything about what is in > an I/O request. > > > > > 2. Where its mentioned in block media that for segment-x[i],x[j] data > read fail out of all all requested segments form [1..n]. > > And I never gone through any driver/protocol, that retry the next i+1th > segment where ith-segment is failed. And for that my suggestion is > preferred. > > The SD/MMC protocol does not indicate which sector has the error. > There is no possibility of trying the ith+1-segment because i is > unknown. [Ghorai] I understand, but I think still you are fever of retrying each and every sector(i, i+1, i+2, ..) for the all request segments sequentially, when single block read for the i-th sector failed. > > > > >> > >>> > >>>>> > >>>>> And the solution I was proposing to return the status of IO failure > as > >>>> soon as possible to above layer; and handle the card removed > interrupt > >>>> separately or any other issue in h/w or s/w or combination of both. > Or > >>>> just think again when platform don't have the card remove interrupt. > >>>>> > >>>>> So my patch addresses the 1st part > >>>> > >>>> It is absolutely unacceptable to return I/O errors to the upper > layers > >>>> for segments that do not have errors. > >>>> > >>>>> And for the 2nd part we can submit the patch anytime. > >>>>> > >>>>>> > >>>>>> I can suggest a patch if you want but I am on vacation next week so > >>>>>> it will have to wait a couple of weeks. > >>>>>> > >>>>>>> And moreover we should not give the interleave data to apps, as we > >>>> don't > >>>>>> have option to tell application for the valid data. > >>>>>>> > >>>>> [..snip..] > >>>>> http://comments.gmane.org/gmane.linux.kernel.mmc/2714 > >>>>> > >>>>>>> > >>>>> > >>>>> > >>> > >>> -- > >>> To unsubscribe from this list: send the line "unsubscribe linux-mmc" > in > >>> the body of a message to majordomo@xxxxxxxxxxxxxxx > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>> > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html