On Fri, Aug 01, 2014 at 10:27:25AM +0100, Lee Jones wrote: > On Thu, 31 Jul 2014, Brian Norris wrote: > > On Thu, Jul 31, 2014 at 05:47:09PM +0100, Lee Jones wrote: > > > > > arch/arm/boot/dts/stih41x-b2020.dtsi | 40 + > > > > > > > > This will need refreshed and sent as a separate patch, to go through the > > > > appropriate ARM tree. > > > > > > As above. > > > > OK, if you're really planning to send a final git pull request, you'll > > need to have this in a separate branch for them to take, then. > > Of course. Same goes with the DTS(I) changes. PSA: make sure your patches will bisect if you're going to send so many. I don't want to have to copy-and-paste my build results again. > > > > (Related: Coverity caught a whole bunch of these type of bugs in the MTD > > > > test modules. I have fixes queued up that I meant to test and submit > > > > soon.) > > > > > > I will attempt to play around with Coverity. > > > > Good luck :) Coverity isn't always the easiest to get running > > individually. You might have better luck (once your stuff is merged) > > Sorry merged where? To Linus's tree. Presumably that's what you're sending your patches to me for? :) Dave Jones runs a Coverity instance against mainline. > > checking out the free service offered for open source projects through > > github. There's a project instance set up for mainline: > > > > https://scan.coverity.com/projects/128?tab=overview > > Thanks. I'll have a play. > > > > > > +/* > > > > > + * Detect an erased page, tolerating and correcting up to a specified number of > > > > > + * bits at '0'. (For many devices, it is now deemed within spec for an erased > > > > > + * page to include a number of bits at '0', either as a result of read-disturb > > > > > + * behaviour or 'stuck-at-zero' failures.) Returns the number of corrected > > > > > + * bits, or a '-1' if we have exceeded the maximum number of bits at '0' (likely > > > > > + * to be a genuine uncorrectable ECC error). In the latter case, the data must > > > > > + * be returned unmodified, in accordance with the MTD API. > > > > > + */ > > > > > +static int check_erased_page(uint8_t *data, uint32_t page_size, int max_zeros) > > > > > > > > Another "check erased page" implementation? Sigh... it would be nice if > > > > we could agree on a common implementation to share. My last attempt was > > > > unsuccessful, since some drivers have some very odd requirements. > > > > > > > > > +{ > > > > > + uint8_t *b = data; > > > > > + int zeros = 0; > > > > > + int i; > > > > > + > > > > > + for (i = 0; i < page_size; i++) { > > > > > + zeros += hweight8(~*b++); > > > > > + if (zeros > max_zeros) > > > > > + return -1; > > > > > + } > > > > > + > > > > > + if (zeros) > > > > > + memset(data, 0xff, page_size); > > > > > > > > It seems like you're not considering the spare area at all. What if this > > > > is a mostly-blank page, with ECC data, but most of the bitflips are in > > > > the spare area? Then you will "correct" this page to all 0xFF, not > > > > noticing that this was really not a blank page at all. > > > > > > I could really use Angus' advice on this one. Although, I fear he may > > > have disappeared into the cosmos. If I remember correctly (I'm > > > getting rusty on this now), this controller doesn't allow access to > > > the spare area easily. I think it's all handled automatically > > > i.e. without intervention. > > > > That's tough. It's pretty hard to support NAND without *any* access to > > spare area. > > I think we _can_ do it, via the second (Flex) interface, but it appears > to be readonly and we only do so when initially scanning/building the > BBTs. Does your driver pass the MTD tests? (drivers/mtd/tests/) > I'm not sure I follow your example precisely. When you say that most > of the bitflips are in the spare area, do you mean that there's usable > data in there? What I mean is that because ALL [*] data (in- and out-of-band) is used by the error correction algorithm, then you need to use all of that data when determining that a page is erased, not just the data that you see in-band. Consider this: you program a page with all 0xFF data, except for a single byte of data. Then suppose your page experiences too many bitflips in the spare area. When you try to read this page back, you will experience an ECC error, and fall back to "checking for an erased page." But if this algorithm only looks at the in-band data, it may see mostly-0xFF, with fewer than 8 bits that are low. Because it ignored all of the ECC parity data stored OOB, then your algorithm might falsely declare your page "erased," when in fact it held data and is instead truly uncorrectable. This kind of subtle error might lead to silent corruption. I haven't exactly seen this in practice, since most MTD users will be programming significant amounts of non-0xFF data to pages, but it's really not a guarantee. And power cuts, for instance, provide a big source of uncertainty, where you can't expect that an unsound algorithm like this will work properly. Note on [*]: some ECC algorithms may ignore parts of the spare area, in order to provide ability to program special markers independently from the rest of the spare area. Brian -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html