>> Are those tests in xfstests or any other public repository? > > crash/xfscrash is, and now dm-log-write, but nothing else is. > There's also CrashMonkey (tool under active development from my research group at UT Austin) at: https://github.com/utsaslab/crashmonkey. >> My first reaction to the corruption was "no way, I need to check the test" >> Second reaction after checking the test was "this must very very hard to hit" >> But from closer inspection, it looks like it doesn't take more than running >> a couple of fsync in parallel to get to the "at risk" state, which may persist >> for seconds. > > That may be the case, but the reality is we don't have a body of > evidence to suggest this is a problem anyone is actually hitting. In > fact, we don't have any evidence it's been seen in the wild at all. Even if people did hit this bug in the wild, as Amir suggested before, the chances of them connecting it to a crash-consistency bug in the file system and reporting as such are extremely low. We do have many reports of data loss in newer file systems like btrfs, but I don't think those are connected back to crash-consistency bugs (my suspicion is there would be many such bugs, simply due to inadequate crash consistency testing). When working on CrashMonkey, we find it hard to understand what a crash consistency bug is, even when we are the ones producing the bug! I think there is a need for good debugging/investigative tools in this area (something we are also working on). Thanks, Vijay http://www.cs.utexas.edu/~vijay/