On Tue, 2014-11-11 at 22:36 +0200, Tanya Brokhman wrote: > Unfortunately none. This is done for a new device that we received just > now. The development was done on a virtual machine with nandsim. Testing > was more of stability and regression OK. So the implementation is theory-driven and misses the experimental prove. This means that building a product based on this implementation has certain amount of risk involved. And from where I am, the theoretical base for the solution also does not look very strong. > > The advantages of the "read all periodically" approach were: > > > > 1. Simple, no modifications needed > > 2. No need to write if the media is read-only, except when scrubbing > > happens. > > 3. Should cover all the NAND effects, including the "radiation" one. > > Disadvantages (as I see it): > 1. performance hit: when do you trigger the "read-all"? will effect > performance Right. We do not know how often, just like we do not know how often and how much (read counter threshold) in your proposal. Performance - sure, matter of experiment, just like the performance of your solution. And as I notice, energy too (read - battery life). In your solution you have to do more work maintaining the counters and writing them. With read solution you do more work reading data. The promise that reading may be done in background, when there is no other I/O. > 2. finds bitflips only when they are present instead of preventing them > from happening But is this true? I do not see how is this true in your case. Yo want to scrub by threshold, which is a theoretical value with very large deviation from the real one. And there may be no real one even - the real one depends on the erase block, it depends on the I/O patterns, and it depends on the temperature. You will end up scrubbing a lot earlier than needed. Here comes the performance loss too (and energy). And you will eventually end up scrubbing too late. I do not see how your solution provides any hard guarantee. Please, explain how do you guarantee that my PEB does not bit-rot earlier than read counter reaches the threshold? It may bit-rot earlier because it is close to be worn out, or because of just higher temperature, or because it has a nano-defect. > Perhaps our design is an overkill for this and not covering 100% of te > usecases. But it was requested by our customers to handle read-disturb > and data retention specifically (as in "prevent" and not just "fix"). > This is due to a new NAND device that should operate in high temperature > and last for ~15-20 years. I understand the whole customer orientation concept. But for me so far the solution does not feel like something suitable to a customer I could imagine. I mean, if I think about me as a potential customer, I would just want my data to be safe and covered from all the NAND effects. I would not want counters, I'd want the result. And in the proposed solution I would not see how I'd get the guaranteed result. But of course I do not know the customer requirements that you've got. -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html