On Thu, Nov 30, 2017 at 08:51:45AM -0600, Ashlie Martinez wrote: > > Even though CrashMonkey *records* all the disk operations, it doesn't > have to replay all of them when generating crash states. For example, > it could choose to fully replay (and preserve the ordering of) all > operations before the 3rd barrier operation in a trace with 5 > different barrier operations in it (we dub each set of operations from > just after the previous barrier operation up to and including the next > barrier operation a "disk write epoch" or "disk epoch"). OK, but then it won't replay any operations after the 4th barrier opration, correct? And in the case where you are stopping somewhere between the 3rd and the 4th replay, you will drop and reorder random operations after the 3rd barrier op, but before the 4th, correct? So what would be good is to understand where it stopped replaying operations ---- and if you can get a strace -ttt of the workload, and the fine-grained timestamps from block I/O trace, so we can understand how far we had gotten in the workload. And then, to also to include the output of debugfs's "logdump -ac" command before the journal is replayed. >From a file system developer's perspective, what is most useful is when we can see the minimal reproduction test case where Crashmonkey is only rearranging block I/O's of the last full "disk write epock". And getting the timestampped strace logs and block I/O logs will help us do that. Otherwise, we have no idea where to look for a potential problem, only that it's one of the several I/O commands. Anyway, looking back at your original question, is your question why the first write isn't delay allocated? That's because the collapse_range operation needs to resolve any delayed allocation writes on the portion of the extent tree which will be affected by the collapse_range operation. See the calls to filemap_write_and_wait_range() in ext4_collapse_range(). Note that if you want to try to understand what is going on, there are a large number of ext4 and jbd2 tracepoints. Enabling these tracepoints (you may need to omit some of the much more chatty jbd2 trace points from the ones that you enable) and dumping those timestamps alongside the strace -ttt and block I/O timestamps should be especially illuminating. Cheers, - Ted P.S. Another set of tracepoints that might be useful for understanding when delayed write allocations are getting resolved are the writeback tracepoints --- although you can probably infer those from the ext4_writepages traces, since when the writeback is triggered this will trigger calls to ext4_writepages.