On Fri 2009-08-28 07:46:42, david@xxxxxxx wrote: > On Thu, 27 Aug 2009, David Woodhouse wrote: > >> On Mon, 2009-08-24 at 20:08 -0400, Theodore Tso wrote: >>> >>> (It's worse with people using Digital SLR's shooting in raw mode, >>> since it can take upwards of 30 seconds or more to write out a 12-30MB >>> raw image, and if you eject at the wrong time, you can trash the >>> contents of the entire CF card; in the worst case, the Flash >>> Translation Layer data can get corrupted, and the card is completely >>> ruined; you can't even reformat it at the filesystem level, but have >>> to get a special Windows program from the CF manufacturer to --maybe-- >>> reset the FTL layer. >> >> This just goes to show why having this "translation layer" done in >> firmware on the device itself is a _bad_ idea. We're much better off >> when we have full access to the underlying flash and the OS can actually >> see what's going on. That way, we can actually debug, fix and recover >> from such problems. >> >>> Early CF cards were especially vulnerable to >>> this; more recent CF cards are better, but it's a known failure mode >>> of CF cards.) >> >> It's a known failure mode of _everything_ that uses flash to pretend to >> be a block device. As I see it, there are no SSD devices which don't >> lose data; there are only SSD devices which haven't lost your data >> _yet_. >> >> There's no fundamental reason why it should be this way; it just is. >> >> (I'm kind of hoping that the shiny new expensive ones that everyone's >> talking about right now, that I shouldn't really be slagging off, are >> actually OK. But they're still new, and I'm certainly not trusting them >> with my own data _quite_ yet.) > > so what sort of test would be needed to identify if a device has this > problem? > > people can do ad-hoc tests by pulling the devices in use and then > checking the entire device, but something better should be available. > > it seems to me that there are two things needed to define the tests. > > 1. a predictable write load so that it's easy to detect data getting lose > > 2. some statistical analysis to decide how many device pulls are needed > (under the write load defined in #1) to make the odds high that the > problem will be revealed. Its simpler than that. It usually breaks after third unplug or so. > for USB devices there may be a way to use the power management functions > to cut power to the device without requiring it to physically be pulled, > if this is the case (even if this only works on some specific chipsets), > it would drasticly speed up the testing This is really so easy to reproduce, that such speedup is not neccessary. Just try the scripts :-). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html