I've filed the issue on github, but just thought I'd mention here too. In real-world use it appears to be intermittent. I"m not yet sure how intermittent, but I could see it being used in production and not caught right away. I got lucky and stumbled on it when looking at graphs of runs and noticed 15 seconds of no activity. https://github.com/axboe/fio/issues/1457 With the null ioengine, I can make it reproduce very reliably, which is encouraging as I move to debug. I had just moved to using log compression as it is really powerful, and the only way to store per I/O logs for a long run without pushing up against the amount of physical memory in a system. (Without compression, a GB of sequential writes at 128K block size is on the order of 245KB of memory per log, so a TB is 245MB per log. Now run a job to fill a 20TB drive and you're at 4.9GB for one log file. If you record all 3 latency numbers too, you're talking close to 20GB.)