On Thu, 2014-11-13 at 14:13 +0200, Tanya Brokhman wrote: > > In your solution you have to do more work maintaining the counters and > > writing them. With read solution you do more work reading data. > > But the maintaining work is minimal here. ++the counter on every read is > all that is required and verify it's value. O(1)... Let's consider the R/O FS on top of UBI case. Fastmap will only be updated when there are erase operations, which may only be cause by scrubbing in this case. IOW, fastmap will be updated extremely rarely. And suppose there is no clean unmount ever happening. Will we always lose erase counters and set them to half the threshold all the time? Even if it was Threshold-1 before, it becomes Threshold/2 after power cut? Don't we actually want to write the read counters when they change significantly enough? > I know... We got the threshold value (that is exposed in my patches as a > define you just missed it) from NAND manufacturer asking to take into > consideration the temperature the device will operate at. I know its > still an estimation but so is the program/erase threshold. Since it was > set by manufacturer - I think its the best one we can hope for. I wonder how constant is the threshold. * Does it change with time, as eraseblock becomes more worn out. Say, the PEB resource is 10000 erase cycles. Will the threshold be the same for PEB at 0 erase cycles and at 5000 erase cycles? * Does it depend on eraseblock? * Does it depend on the I/O in other eraseblocks? Just wonder how pessimistic is the threshold number manufacturers give. Just curious to learn more about this number, and have an idea about how reliable is it. > > You will end up scrubbing a lot earlier than needed. Here comes the > > performance loss too (and energy). And you will eventually end up > > scrubbing too late. > > I don't see why I would end up scrubbing too late? Well, one example - see above, you lose the read counters often, always reset to threshold/2, end up reading more than the threshold. The other doubt is that the threshold you use is actually the right one for a worst case usage scenario of the end product. But probably it is about just learning more about this threshold value. > I can't guarantee it wont bit-flip, I don't think any one could but I > can say that with my implementation the chance of bit-flip is reduced. That was my point. There is already a solution for the problem you are trying to solve. It is implemented. And it covers not just the problem you are solving, but the other problems of NAND. So probably what is missing is some kind of better analysis or experimental prove that the solution which is already implemented (let's call it "periodic read") is defective. May be I should expand a bit more on why the periodic read solution does not look bad to me. If the ECC is strong enough for the flash chip in question, then bit-flips will accumulate slowly enough. First one bit-flip, then 2, then 3, etc. All you need to do is to make your read period good enough to make sure no PEB accumulates too many bit-flips. E.g., modern ECCs cover 8 or more bit-flips. And the other compelling point here that this will cover all other NAND effects. All of them lead to more bit-flips at the end, right? And you just fix bit-flips when they come. You do not care why they came. You just deal with them. And what is very nice is that you do not need to implement anything, or you implement very little. > In an endless loop - read page 3 of PEB-A. > This will effect near by pages (say 4 and 2 for simplicity). But if I > scrub the whole PEB according to read-counter I will save data of pages > 2 and 4. > If I do nothing: when reading eventually page 4 it will produce > bit-flips that may not be fixable. This is quite artificial example, but yes, if you read the same page in a tight loop, you may cause bit flips fast enough, faster than your periodic read task starts reading your media. But first of all, how realistic is this scenario? I am sure not, especially if there is an FS on top of UBI and the data are cached, so the second read actually reads from RAM. Secondly, can this scenario be covered by simpler means? Say, UBI could watch the read ratio, and if it grows, trigger the scrubber task earlier? > > I understand the whole customer orientation concept. But for me so far > > the solution does not feel like something suitable to a customer I could > > imagine. I mean, if I think about me as a potential customer, I would > > just want my data to be safe and covered from all the NAND effects. > > I'm not sure that at the moment "all NAND effects" can be covered. I explained how I see it above in this e-mail. In short: read all data often enough ("enough" is defined by your product), and you are done. All "NAND effects" lead to bit-flips, you fix bit-flips faster than they become hard errors, and you are done. -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html